Learning Temporally-Extended Actions with Uncertainty-Aware Q-learning

이중규

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Learning Temporally-Extended Actions with Uncertainty-Aware Q-learning : 불확실성을 고려한 반복 행동 정책 학습

DC Field	Value	Language
dc.contributor.advisor	오민환	-
dc.contributor.author	이중규	-
dc.date.accessioned	2023-06-29T02:08:22Z	-
dc.date.available	2023-06-29T02:08:22Z	-
dc.date.issued	2023	-
dc.identifier.other	000000176360	-
dc.identifier.uri	https://hdl.handle.net/10371/193612	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000176360	ko_KR
dc.description	학위논문(석사) -- 서울대학교대학원 : 데이터사이언스대학원 데이터사이언스학과, 2023. 2. 오민환.	-
dc.description.abstract	In reinforcement learning, temporal abstraction in action space is a common approach to simplifying the learning process of policies through temporally extended courses of action. In recent work, temporal abstractions are often mod eled as repeating the chosen action for a certain duration. A major drawback of the prior work on action repetition is that repetitions of suboptimal actions may lead to significant deterioration in performance. Hence, the degradation in performance that action repetition causes can be larger than the gains it provides. We propose a novel algorithm named Uncertainty-aware Temporal Extension (UTE), which leverages ensemble methods to estimate uncertainty when extending an action. Our uncertainty-aware learning framework can allow policies to be exploration-favor or uncertainty-averse. We empirically demonstrate the efficacy of UTE on both gridworld and Atari 2600 environments, exhibiting superior performances over alternative algorithms.	-
dc.description.abstract	강화 학습에서, 행동의 추상화는 정책의 학습과정을 간소화하는 일반적인 접근 방식입니다. 최근, 행동의 추상화를 구현하는 방법론으로 단순히 행동을 일정 기간 동안 반복하는 것이 연구되고 있습니다. 그러나 기존의 행동 반복 연구들의 주요 단점은 차선의 행동을 불필요하게 많이 반복하여 성능이 저하될 수 있다는 문제점이 있습니다. 이러한 경우, 행동의 반복으로 탐색에 이점을 가지는 것보다 그로인한 성능 저하가 더 클 수 있습니다. 따라서, 앙상블 기법을 활용하여 불확실성을 측정하고, 그 불확실성을 고려한 행동 연장 알고리듬(Uncertainty-aware Temporal Extension, UTE)을 고안하였습니다 우리의 알고리듬은 불확실성을 제어하여 더 적극적인 탐색을 유도하거나, 불확실성을 회피하는 정책을 유도할 수 있습니다. 우리는 그리드 월드와 아타리 2600 환경을 비롯한 다양한 환경에서 성능을 평가하였고, 기존의 방법론들보다 우수한 성능을 보임을 확인하였습니다.	-
dc.description.tableofcontents	1 Introduction 1 2 Related Work 4 3 Preliminaries and Notations 7 4 Method: Uncertainty-aware Temporal Extension 10 4.1 Temporally-Extended Q-Learning 10 4.2 Option Decomposition 11 4.3 Ensemble-based Risk-Sensitive Action Repetition 12 4.4 n-step Q-Learning 14 5 Experiments 16 5.1 Chain MDP 16 5.2 Gridworlds 19 5.3 Atari 2600: Arcade Learning Environment 22 6 Conclusion 26 7 Appendix 27	-
dc.format.extent	lxv, 65	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Reinforcement Learning	-
dc.subject	Temporal Abstraction	-
dc.subject	Action Repeat	-
dc.subject	Uncertainty	-
dc.subject	Exploration	-
dc.subject.ddc	005	-
dc.title	Learning Temporally-Extended Actions with Uncertainty-Aware Q-learning	-
dc.title.alternative	불확실성을 고려한 반복 행동 정책 학습	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Joongkyu Lee	-
dc.contributor.department	데이터사이언스대학원 데이터사이언스학과	-
dc.description.degree	석사	-
dc.date.awarded	2023-02	-
dc.identifier.uci	I804:11032-000000176360	-
dc.identifier.holdings	000000000049▲000000000056▲000000176360▲	-

Appears in Collections:

Graduate School of Data Science (데이터사이언스 대학원)
- Theses (Master's Degree_데이터사이언스학과)

Files in This Item:

000000176360.pdf 8.11 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share