Deep Reinforcement Learning with LSTM-based Exploration Bonus

양진호

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Deep Reinforcement Learning with LSTM-based Exploration Bonus

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 양진호

Advisor: 유석인

Major: 공과대학 컴퓨터공학부

Issue Date: 2017-02

Publisher: 서울대학교 대학원

Keywords: Deep Learning ; Deep Reinforcement Learning ; Long-Short Term Memory Networks ; Exploration and Exploitation Trade-O ff

Description: 학위논문 (석사)-- 서울대학교 대학원 : 컴퓨터공학부, 2017. 2. 유석인.

Abstract: Deep learning is dominant method in recent machine learning community. Deep learning outperforms traditional methods in various tasks such as image classification, object recognition, speech recognition, and natural language processing. However, most deep learning algorithm is focused on supervised learning task. Supervised learning only considers static environment therefore it is not proper to dynamic environment.

To solve this problem, reinforcement learning method combined with deep learning called deep reinforcement learning is proposed. Deep reinforcement learning is composed of two parts. First part is feature extraction using deep learning method. Second part is learning proper action for agent using reinforcement learning through trial and error.

However, reinforcement learning algorithm has exploration and exploration trade-off problem. It is hard to find optimal exploration and exploitation ratio for reinforcement learning. To solve this problem, we propose novel deep reinforcement learning algorithm with LSTM-based exploration bonus. LSTM-based exploration bonus uses Long-Short Term Memory (LSTM) networks as predictor and makes exploration bonus based on prediction error. LSTM-based exploration bonus guides agent play more daring action. As a result, agent could find optimal solution in more short time.

We test our method playing various Atari games. Experimental results shows our method outperforms plain deep reinforcement learning method. This shows the effectiveness of our method.

Language: English

URI: https://hdl.handle.net/10371/122687

Files in This Item:

000000140903.pdf 2.89 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share