Exploratory Hybrid Search in Hierarchical Reinforcement Learning

이상엽

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Exploratory Hybrid Search in Hierarchical Reinforcement Learning : 계층 강화 학습에서의 탐험적 혼합 탐색

DC Field	Value	Language
dc.contributor.advisor	문병로	-
dc.contributor.author	이상엽	-
dc.date.accessioned	2020-10-13T02:55:18Z	-
dc.date.available	2020-10-13T02:55:18Z	-
dc.date.issued	2020	-
dc.identifier.other	000000162127	-
dc.identifier.uri	https://hdl.handle.net/10371/169320	-
dc.identifier.uri	http://dcollection.snu.ac.kr/common/orgView/000000162127	ko_KR
dc.description	학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2020. 8. 문병로.	-
dc.description.abstract	Balancing exploitation and exploration is a great challenge in many optimization problems. Evolutionary algorithms, such as evolutionary strategies and genetic algorithms, are algorithms inspired by biological evolution. They have been used for various optimization problems, such as combinatorial optimization and continuous optimization. However, evolutionary algorithms lack fine-tuning near local optima; in other words, they lack exploitation power. This drawback can be overcome by hybridization. Hybrid genetic algorithms, or memetic algorithms, are successful examples of hybridization. Although the solution space is exponentially vast in some optimization problems, these algorithms successfully find satisfactory solutions. In the deep learning era, the problem of exploitation and exploration has been relatively neglected. In deep reinforcement learning problems, however, balancing exploitation and exploration is more crucial than that in problems with supervision. Many environments in the real world have an exponentially wide state space that must be explored by agents. Without sufficient exploration power, agents only reveal a small portion of the state space and end up with seeking only instant rewards. In this thesis, a hybridization method is proposed which contains both gradientbased policy optimization with strong exploitation power and evolutionary policy optimization with strong exploration power. First, the gradient-based policy optimization and evolutionary policy optimization are analyzed in various environments. The results demonstrate that evolutionary policy optimization is robust for sparse rewards but weak for instant rewards, whereas gradient-based policy optimization is effective for instant rewards but weak for sparse rewards. This difference between the two optimizations reveals the potential of hybridization in policy optimization. Then, a hybrid search is suggested in the framework of hierarchical reinforcement learning. The results demonstrate that the hybrid search finds an effective agent for complex environments with sparse rewards thanks to its balanced exploitation and exploration.	-
dc.description.abstract	많은 최적화 문제에서 탐사와 탐험의 균형을 맞추는 것은 매우 중요한 문제이다. 진화 전략과 유전 알고리즘과 같은 진화 알고리즘은 자연에서의 진화에서 영감을 얻은 메타휴리스틱 알고리즘이다. 이들은 조합 최적화, 연속 최적화와 같은 다양한 최적화 문제를 풀기 위해 사용되었다. 하지만 진화 알고리즘은 지역 최적해 근처에서의 미세 조정, 즉 탐사에 약한 특성이 있다. 이러한 결점함은 혼합화를 통해 극복할 수 있다. 혼합 유전 알고리즘, 혹은 미미틱 알고리즘이 성공적인 혼합화의 사례이다. 이러한 알고리즘은 최적화 문제의 해 공간이 기하급수적으로 넓더라도 성공적으로 만족스러운 해를 찾아낸다. 한편 심층 학습의 시대에서, 탐사와 탐험의 균형을 맞추는 문제는 종종 무시되었다. 하지만 심층 강화학습에서는 탐사와 탐험의 균형을 맞추는 일은 지도학습에서보다 훨씬 더 중요하다. 많은 실제 세계의 환경은 기하급수적으로 큰 상태 공간을 가지고 있고 에이전트는 이를 탐험해야만 한다. 충분한 탐험 능력이 없으면 에이전트는 상태 공간의 극히 일부만을 밝혀내어 결국 즉각적인 보상만 탐하게 될 것이다. 본 학위논문에서는 강한 탐사 능력을 가진 그레디언트 기반 정책 최적화와 강한 탐험 능력을 가진 진화적 정책 최적화를 혼합하는 기법을 제시할 것이다. 우선 그레디언트 기반 정책 최적화와 진화적 정책 최적화를 다양한 환경에서 분석한다. 결과적으로 그레디언트 기반 정책 최적화는 즉각적 보상에 효과적이지만 보상의 밀도가 낮을때 취약한 반면 진화적 정책 최적화가 밀도가 낮은 보상에 대해 강하지만 즉각적인 보상에 대해 취약하다는 것을 알 수 있다. 두 가지 최적화의 특징 상 차이점이 혼합적 정책 최적화의 가능성을 보여준다. 그리고 계층적 강화 학습 프레임워크에서의 혼합 탐색 기법을 제시한다. 그 결과 혼합 탐색 기법이 균형잡힌 탐사와 탐험 덕분에 밀도가 낮은 보상을 주는 복잡한 환경에서 효과적인 에이전트를 찾아낸 다는 것을 보여준다.	-
dc.description.tableofcontents	I. Introduction 1 II. Background 6 2.1 Evolutionary Computations 6 2.1.1 Hybrid Genetic Algorithm 7 2.1.2 Evolutionary Strategy 9 2.2 Hybrid Genetic Algorithm Example: Brick Layout Problem 10 2.2.1 Problem Statement 11 2.2.2 Hybrid Genetic Algorithm 11 2.2.3 Experimental Results 14 2.2.4 Discussion 15 2.3 Reinforcement Learning 16 2.3.1 Policy Optimization 19 2.3.2 Proximal Policy Optimization 21 2.4 Neuroevolution for Reinforcement Learning 23 2.5 Hierarchical Reinforcement Learning 25 2.5.1 Option-based HRL 26 2.5.2 Goal-based HRL 27 2.5.3 Exploitation versus Exploration 27 III. Understanding Features of Evolutionary Policy Optimizations 29 3.1 Experimental Setup 31 3.2 Feature Analysis 32 3.2.1 Convolution Filter Inspection 32 3.2.2 Saliency Map 36 3.3 Discussion 40 3.3.1 Behavioral Characteristics 40 3.3.2 ES Agent without Inputs 42 IV. Hybrid Search for Hierarchical Reinforcement Learning 44 4.1 Method 45 4.2 Experimental Setup 47 4.2.1 Environment 47 4.2.2 Network Architectures 50 4.2.3 Training 50 4.3 Results 51 4.3.1 Comparison 51 4.3.2 Experimental Results 53 4.3.3 Behavior of Low-Level Policy 54 4.4 Conclusion 55 V. Conclusion 56 5.1 Summary 56 5.2 Future Work 57 Bibliography 58	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Deep reinforcement learning	-
dc.subject	Evolutionary computation	-
dc.subject	hierarchical reinforcement learning	-
dc.subject	Neuroevolution	-
dc.subject.ddc	621.3	-
dc.title	Exploratory Hybrid Search in Hierarchical Reinforcement Learning	-
dc.title.alternative	계층 강화 학습에서의 탐험적 혼합 탐색	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.department	공과대학 전기·컴퓨터공학부	-
dc.description.degree	Doctor	-
dc.date.awarded	2020-08	-
dc.identifier.uci	I804:11032-000000162127	-
dc.identifier.holdings	000000000043▲000000000048▲000000162127▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Files in This Item:

000000162127.pdf 3.89 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share