Generalizable Agents with Improved Abstractions and Transfer

김재겸

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Generalizable Agents with Improved Abstractions and Transfer : 향상된 추상화와 전이를 이용한 일반화 가능한 에이전트

DC Field	Value	Language
dc.contributor.advisor	김건희	-
dc.contributor.author	김재겸	-
dc.date.accessioned	2023-11-20T04:24:17Z	-
dc.date.available	2023-11-20T04:24:17Z	-
dc.date.issued	2023	-
dc.identifier.other	000000177236	-
dc.identifier.uri	https://hdl.handle.net/10371/196493	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000177236	ko_KR
dc.description	학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2023. 8. 김건희.	-
dc.description.abstract	Many researchers in the field of deep learning have been trying to build agents that perform a wide range of tasks. Since training on all the possible tasks is often not viable, improving the generalization of agents to novel tasks based on what they learn from training tasks has been one of the important challenges in deep learning. For effective generalization, both learning abstractions that can be used under different conditions and the exploitation of the abstractions on new tasks are crucial. In this thesis, we explore the challenge of generalization mainly in those two aspects, abstraction and transfer. First, we study how to abstract input data and learn features that are robust to noise. As task-irrelevant information during inference can hugely impact the performance of learned models and agents, establishing robustness to such noise is an important problem in generalization. To tackle the problem, we propose a discrete information bottleneck method named Drop-Bottleneck, which learns to discretely drop features that are irrelevant to the target variable and distill features of interest. It enjoys not only a simple information compression objective but also provides deterministic compressed representations, which are useful for inference with complete consistency and improved efficiency due to the reduced number of features. We then investigate how the agent can discover inherent behaviors in the environment without supervision and abstract them into skills in a more reusable form. Unsupervised skill discovery aims at finding and learning a set of useful behaviors by interacting within the environment but with no external rewards. It is one of the key challenges of temporal abstraction in reinforcement learning as it allows the agent to reuse the knowledge of the learned skills and solve new tasks more efficiently and effectively. To the goal, we suggest an unsupervised skill discovery method named Information Bottleneck Option Learning (IBOL). It seeks extensive behaviors in the state space by linearizing the environment and abstract those behaviors with disentanglement encouraged with the imposition of information bottleneck for improved reusability of the skills. Lastly, we probe a way to leverage the knowledge learned from source tasks to improve the performance on target tasks without further training. For zero-shot transfer in reinforcement learning where the reward function varies between different tasks, the successor features framework is one of the popular approaches. Our goal is to enhance the transfer of the learned value approximators with successor features to new tasks by bounding the errors on the new target tasks. Given a set of source tasks with their successor features, we present lower and upper bounds on the optimal values for novel task vectors that are expressible as linear combinations of source task vectors. We then propose constrained GPI as a simple test-time approach that can improve the transfer by constraining value approximations on new target tasks.	-
dc.description.abstract	딥러닝 분야의 많은 연구자들은 다양한 작업을 수행할 수 있는 에이전트(agent)를 만들고자 해왔다. 가능한 모든 작업에 대해 학습을 수행하는 것은 보통 불가능하기 때문에, 에이전트가 학습용 작업에서 배운 것을 기반으로 새로운 작업에 대해 더 잘 일반화하도록 하는 것은 딥러닝의 중요한 과제 중 하나이다. 효과적인 일반화에는, 서로 다른 조건 하에서 사용될 수 있는 추상화(abstraction)를 학습하는 것과 그러한 추상화를 새로운 작업에 잘 활용하는 것, 두 가지 모두가 중요하다. 본 학위논문에서는 이러한 일반화 과제를 위와 같이 추상화(abstraction)와 전이(transfer), 이 두 가지 관점에서 다룬다. 먼저 입력 데이터를 추상화하고 잡음(noise)에 강건한 특징(features)을 학습하는 것에 대해 다룬다. 추론 시점에 주어지는, 작업과 무관한 정보는 학습된 모델과 에이전트의 성능에 큰 영향을 미칠 수 있기 때문에, 그러한 잡음에 대한 강건성을 갖도록 하는 것은 일반화에서 중요한 문제 중 하나이다. 이러한 문제를 해결하기 위해, 작업 변수와 무관한 특징을 이산적으로(discretely) 제거하고 원하는 특징을 남기는, Drop-Bottleneck이라는 이산적 정보 병목(discrete information bottleneck) 방법론을 제안한다. 이 방법론은 단순한 정보 압축 목표(objective)를 가지며 결정론적인 압축된 표현 또한 제공하는데, 이는 온전한 일관성 및 줄어든 특징 수에 따른 향상된 효율성을 필요로 하는 추론에 유용하다. 또한, 주어진 환경에서 에이전트가 지도(supervision) 없이 가능한 행동을 발견하고 더 재사용 가능한 형태의 스킬(skill)로 추상화하는 것을 다룬다. 비지도적 스킬 발견은 외부적 보상 없이 환경과 상호작용하며 유용한 행동을 찾고 학습하는 것을 목표로 한다. 이는 학습된 스킬의 지식을 재사용하고 새로운 작업을 더 효율적이고 효과적으로 수행할 수 있도록 하기 때문에, 강화학습에서 시간적 추상화의 중요한 과제 중 하나이다. 해당 목표를 위해 본 논문에서는 Information Bottleneck Option Learning (IBOL)이라는 비지도적 스킬 발견 방법론을 제시한다. 이것은 환경을 선형화하여 상태 공간에서 더 광범위한 행동을 찾고, 스킬의 재사용성을 향상시키기 위해 정보 병목을 통해 해당 행동들의 얽히지 않은(disentangled) 추상화를 학습한다. 마지막으로, 원천 작업에서 학습한 지식을, 추가적인 학습 없이 대상 작업에서의 성능을 향상시키는데 활용하는 방법을 다룬다. 강화학습에서 작업들 사이에 보상 함수가 달라지는 조건에서의 제로샷(zero-shot) 전이에는 후속 특징(successor features) 프레임워크가 많이 사용된다. 본 학위논문에서는 후속 특징을 이용하는 학습된 가치(value) 근사기의 새로운 작업으로의 전이를, 해당 작업에서의 오류를 제한함으로써 향상시키는 것을 목적으로 한다. 원천 작업 및 해당 작업들에서 학습된 후속 특징들을 이용해, 원천 작업 벡터들의 선형 결합으로 표현 가능한 새로운 작업 벡터에서의 최적 가치에 대한 하한 및 상한을 제시한다. 그리고 constrained GPI라는, 새로운 대상 작업에서의 가치 근삿값을 제한하여 전이를 향상시키는 단순한 시험 시점 방법론을 제시한다.	-
dc.description.tableofcontents	Abstract i Chapter 1 Introduction 1 1.1 Contributions 4 1.2 Thesis Organization 7 Chapter 2 Robust and Efficient Feature Abstraction with Discrete Information Bottleneck 8 2.1 Overview 8 2.2 Related Work 10 2.3 Feature Abstraction with Drop-Bottleneck 12 2.3.1 Preliminaries of Information Bottleneck 12 2.3.2 Drop-Bottleneck 13 2.3.3 Deterministic Compressed Representation 15 2.3.4 Training with Drop-Bottleneck 16 2.4 Robust Exploration with Drop-Bottleneck 16 2.5 Experiments 18 2.5.1 Experimental Setup for Exploration Tasks 19 2.5.2 Exploration in Noisy Static Maze Environments 22 2.5.3 Exploration in Noisy and Randomly Generated Maze Environments 23 2.5.4 Comparison with VIB: Adversarial Robustness & Dimension Reduction 25 2.5.5 Comparison with VCEB: Adversarial Robustness 30 2.5.6 Removal of Task-irrelevant Information and Validity of Deterministic Compressed Representation 32 2.5.7 Visualization of Task-irrelevant Information Removal 37 2.5.8 Ablation Study: Exploration without DB 38 2.6 Summary 39 Chapter 3 Disentangled Temporal Abstraction for Reusable Skills 41 3.1 Overview 41 3.2 Preliminaries and Related Work 44 3.3 Information Bottleneck Option Learning (IBOL) 47 3.3.1 Linearization of Environments 49 3.3.2 Skill Discovery with Bottleneck Learning 50 3.3.3 Derivation of the Lower Bound 53 3.3.4 Encouraging Disentanglement 54 3.3.5 Decomposition of the KL Divergence Term 55 3.3.6 Training 56 3.4 Experiments 58 3.4.1 Experimental Setup 59 3.4.2 Visualization of Learned Skills 64 3.4.3 Information-Theoretic Evaluations 64 3.4.4 Varying Number of Bins for MI Estimation 67 3.4.5 Evaluation on Downstream Tasks 72 3.4.6 Diversity of External Returns 75 3.4.7 Comparison of Reward Function Choices for the Linearizer 76 3.4.8 Additional Observations 78 3.4.9 Ablation Study 80 3.5 Summary 83 Chapter 4 Test-Time Improvement with Source Approximation 85 4.1 Overview 85 4.2 Related Work 87 4.3 Preliminaries 90 4.3.1 The Zero-Shot Transfer Problem in RL 90 4.3.2 Successor Features and Universal Successor Features Approximators 91 4.3.3 Universal Successor Features Approximators with Learned ϕ 94 4.4 Constrained GPI for Improved Zero-Shot Transfer of Successor Features 94 4.4.1 Bounding Optimal Values for New Tasks 95 4.4.2 Constrained Training and Constrained GPI 98 4.5 Experiments 101 4.5.1 Scavenger Experiments 101 4.5.2 Robotic Locomotion Experiments 105 4.5.3 DeepMind Lab Experiments with Learned ϕ 109 4.5.4 Implementation Details 112 4.6 Summary 114 Chapter 5 Conclusion 116 5.1 Summary 116 5.2 Future Work 117 Acknowledgements 136 요약 138	-
dc.format.extent	xix, 139	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Deep Learning	-
dc.subject	Deep Reinforcement Learning	-
dc.subject	Skill Discovery	-
dc.subject	Temporal Abstraction	-
dc.subject	Transfer Learning	-
dc.subject.ddc	621.39	-
dc.title	Generalizable Agents with Improved Abstractions and Transfer	-
dc.title.alternative	향상된 추상화와 전이를 이용한 일반화 가능한 에이전트	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Jaekyeom Kim	-
dc.contributor.department	공과대학 컴퓨터공학부	-
dc.description.degree	박사	-
dc.date.awarded	2023-08	-
dc.identifier.uci	I804:11032-000000177236	-
dc.identifier.holdings	000000000050▲000000000058▲000000177236▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Files in This Item:

000000177236.pdf 14.33 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share