State Representation for Efficient Task Adaptation in Reinforcement Learning

Abstract: An intelligent agent is expected to make a series of proper decisions in order to solve a new task by leveraging its own previous experience. The scheme of unsupervised reinforcement learning is analogous: the agent is equipped with generalized ability after it learns a set of potentially useful behaviors or extracts the information from dynamics without any explicit reward from the environment. However, a couple of major challenges remain such as how to obtain a compact yet rich state representations at the pretraining phase and how agents can efficiently adapt to the task at the fine-tuning phase. To this end, this study proposes two different methods to tackle both concerns. First, mixing discovered skills improve the sample efficiency by interpreting the skills as a perspective of how an agent transforms the state. The experiment shows that the various mixing methods affect the final performance. Second, contrastive learning plays a key role in temporal state representation which has an explicit meaning of reachability from one state to another. It is shown that the agent can directly adapt to the given task without further training when it is optimized.
지능형 에이전트는 자신의 이전 경험을 활용하여 새로운 작업을 해결하기 위해 일련의 적절한 결정을 내릴 것으로 예상된다. 이는 비지도 강화학습 체계와 유사한데, 에이전트는 환경으로부터 명시적인 보상 없이 잠재적으로 유용한 행동들을 학습하거나 환경에서 정보를 추출한 후 일반화된 능력을 갖추게 된다. 그러나 사전 학습 단계에서 어떻게 간결하면서도 풍부한 상태 표현을 얻을 것인지, 그리고 미세 조정 단계에서 어떻게 에이전트가 작업에 효율적으로 적응할 수 있을지에 관한 주요 과제가 남아있다. 이를 위해 본 연구에서는 두 가지 과제를 모두 해결하기 위한 두 개의 서로 다른 방법을 제안한다. 첫째, 발견된 기술을 혼합함으로써 에이전트가 상태를 변환하는 방법에 대한 관점으로 기술을 해석하여 샘플 효율성을 향상시킨다. 실험 결과 다양한 혼합 방법이 최종 성능에 영향을 미치는 것으로 나타났다. 둘째, 대조 학습은 어떤 상태에서 다른 상태로의 도달 가능성에 대한 명시적인 의미를 갖는 시간적 상태 표현에 핵심적인 역할을 한다. 에이전트가 최적화될 때 주어진 작업에 직접 적응할 수 있는 것으로 나타났다.

Language: eng

URI: https://hdl.handle.net/10371/193433

https://dcollection.snu.ac.kr/common/orgView/000000176998

Files in This Item:

000000176998.pdf 0.77 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Program in Artificial Intelligence (협동과정-인공지능전공)
  - Theses (Master's Degree_협동과정-인공지능전공)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share