Generalizable Agents with Improved Abstractions and Transfer

Abstract: Many researchers in the field of deep learning have been trying to build agents that perform a wide range of tasks. Since training on all the possible tasks is often not viable, improving the generalization of agents to novel tasks based on what they learn from training tasks has been one of the important challenges in deep learning. For effective generalization, both learning abstractions that can be used under different conditions and the exploitation of the abstractions on new tasks are crucial. In this thesis, we explore the challenge of generalization mainly in those two aspects, abstraction and transfer.
First, we study how to abstract input data and learn features that are robust to noise. As task-irrelevant information during inference can hugely impact the performance of learned models and agents, establishing robustness to such noise is an important problem in generalization. To tackle the problem, we propose a discrete information bottleneck method named Drop-Bottleneck, which learns to discretely drop features that are irrelevant to the target variable and distill features of interest. It enjoys not only a simple information compression objective but also provides deterministic compressed representations, which are useful for inference with complete consistency and improved efficiency due to the reduced number of features.
We then investigate how the agent can discover inherent behaviors in the environment without supervision and abstract them into skills in a more reusable form. Unsupervised skill discovery aims at finding and learning a set of useful behaviors by interacting within the environment but with no external rewards. It is one of the key challenges of temporal abstraction in reinforcement learning as it allows the agent to reuse the knowledge of the learned skills and solve new tasks more efficiently and effectively. To the goal, we suggest an unsupervised skill discovery method named Information Bottleneck Option Learning (IBOL). It seeks extensive behaviors in the state space by linearizing the environment and abstract those behaviors with disentanglement encouraged with the imposition of information bottleneck for improved reusability of the skills.
Lastly, we probe a way to leverage the knowledge learned from source tasks to improve the performance on target tasks without further training. For zero-shot transfer in reinforcement learning where the reward function varies between different tasks, the successor features framework is one of the popular approaches. Our goal is to enhance the transfer of the learned value approximators with successor features to new tasks by bounding the errors on the new target tasks. Given a set of source tasks with their successor features, we present lower and upper bounds on the optimal values for novel task vectors that are expressible as linear combinations of source task vectors. We then propose constrained GPI as a simple test-time approach that can improve the transfer by constraining value approximations on new target tasks.
딥러닝 분야의 많은 연구자들은 다양한 작업을 수행할 수 있는 에이전트(agent)를 만들고자 해왔다. 가능한 모든 작업에 대해 학습을 수행하는 것은 보통 불가능하기 때문에, 에이전트가 학습용 작업에서 배운 것을 기반으로 새로운 작업에 대해 더 잘 일반화하도록 하는 것은 딥러닝의 중요한 과제 중 하나이다. 효과적인 일반화에는, 서로 다른 조건 하에서 사용될 수 있는 추상화(abstraction)를 학습하는 것과 그러한 추상화를 새로운 작업에 잘 활용하는 것, 두 가지 모두가 중요하다. 본 학위논문에서는 이러한 일반화 과제를 위와 같이 추상화(abstraction)와 전이(transfer), 이 두 가지 관점에서 다룬다.
먼저 입력 데이터를 추상화하고 잡음(noise)에 강건한 특징(features)을 학습하는 것에 대해 다룬다. 추론 시점에 주어지는, 작업과 무관한 정보는 학습된 모델과 에이전트의 성능에 큰 영향을 미칠 수 있기 때문에, 그러한 잡음에 대한 강건성을 갖도록 하는 것은 일반화에서 중요한 문제 중 하나이다. 이러한 문제를 해결하기 위해, 작업 변수와 무관한 특징을 이산적으로(discretely) 제거하고 원하는 특징을 남기는, Drop-Bottleneck이라는 이산적 정보 병목(discrete information bottleneck) 방법론을 제안한다. 이 방법론은 단순한 정보 압축 목표(objective)를 가지며 결정론적인 압축된 표현 또한 제공하는데, 이는 온전한 일관성 및 줄어든 특징 수에 따른 향상된 효율성을 필요로 하는 추론에 유용하다.
또한, 주어진 환경에서 에이전트가 지도(supervision) 없이 가능한 행동을 발견하고 더 재사용 가능한 형태의 스킬(skill)로 추상화하는 것을 다룬다. 비지도적 스킬 발견은 외부적 보상 없이 환경과 상호작용하며 유용한 행동을 찾고 학습하는 것을 목표로 한다. 이는 학습된 스킬의 지식을 재사용하고 새로운 작업을 더 효율적이고 효과적으로 수행할 수 있도록 하기 때문에, 강화학습에서 시간적 추상화의 중요한 과제 중 하나이다. 해당 목표를 위해 본 논문에서는 Information Bottleneck Option Learning (IBOL)이라는 비지도적 스킬 발견 방법론을 제시한다. 이것은 환경을 선형화하여 상태 공간에서 더 광범위한 행동을 찾고, 스킬의 재사용성을 향상시키기 위해 정보 병목을 통해 해당 행동들의 얽히지 않은(disentangled) 추상화를 학습한다.
마지막으로, 원천 작업에서 학습한 지식을, 추가적인 학습 없이 대상 작업에서의 성능을 향상시키는데 활용하는 방법을 다룬다. 강화학습에서 작업들 사이에 보상 함수가 달라지는 조건에서의 제로샷(zero-shot) 전이에는 후속 특징(successor features) 프레임워크가 많이 사용된다. 본 학위논문에서는 후속 특징을 이용하는 학습된 가치(value) 근사기의 새로운 작업으로의 전이를, 해당 작업에서의 오류를 제한함으로써 향상시키는 것을 목적으로 한다. 원천 작업 및 해당 작업들에서 학습된 후속 특징들을 이용해, 원천 작업 벡터들의 선형 결합으로 표현 가능한 새로운 작업 벡터에서의 최적 가치에 대한 하한 및 상한을 제시한다. 그리고 constrained GPI라는, 새로운 대상 작업에서의 가치 근삿값을 제한하여 전이를 향상시키는 단순한 시험 시점 방법론을 제시한다.

Language: eng

URI: https://hdl.handle.net/10371/196493

https://dcollection.snu.ac.kr/common/orgView/000000177236

Files in This Item:

000000177236.pdf 14.33 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share