Efficient Exploration in Reinforcement Learning for Online Slate Recommender System

박승준

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Efficient Exploration in Reinforcement Learning for Online Slate Recommender System : 강화학습 기반 온라인 슬레이트 추천 시스템에서의 효율적 탐색 방법

DC Field	Value	Language
dc.contributor.advisor	오민환	-
dc.contributor.author	박승준	-
dc.date.accessioned	2023-06-29T02:08:16Z	-
dc.date.available	2023-06-29T02:08:16Z	-
dc.date.issued	2023	-
dc.identifier.other	000000176412	-
dc.identifier.uri	https://hdl.handle.net/10371/193608	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000176412	ko_KR
dc.description	학위논문(석사) -- 서울대학교대학원 : 데이터사이언스대학원 데이터사이언스학과, 2023. 2. 오민환.	-
dc.description.abstract	Deep reinforcement learning (RL) is a promising approach for recommender systems, of which the ultimate goal is to maximize the long-term user value. However, practical exploration strategies for real-world applications have not been addressed. We propose an efficient exploration strategy for deep RL-based recommendation, RESR. We develop a latent state learning scheme and an off-policy learning objective with randomized Q-values to foster efficient learning. Online simulation experiments conducted with synthetic and real-world data validate the effectiveness of our method.	-
dc.description.abstract	온라인 추천 시스템에서 사용자의 장기적 가치를 최대화하기 위한 방법으로 강화학습을 활용할 수 있다. 일반적인 추천 시스템과 다르게 강화학습 기반 추천 시스템은 사용자의 선택에 따른 변화를 포착하고 장기적 차원에서 사용자의 가치를 높일 수 있다. 본 논문에서는 강화학습을 실제 적용하는 과정에서 필요한 효율적인 탐색 방법을 다룬다. 우선, 강화학습 에이전트와 사용자 및 아이템으로 이루어진 추천 문제를 부분 관찰 마르코프 의사결정 과정(POMDP)을 이용한 순차적 의사결정 문제로 구성한다. 슬레이트(Slate)라고 불리는 여러 개의 아이템으로 구성된 리스트를 사용자에게 추천하는 문제를 풀고자 한다. 본 논문은 바로 관측이 어려운 사용자의 잠재 상태를 다룬다는 점에서 과거 연구의 일반화된 문제를 연구한다. 본 논문에서 제시하는 알고리듬인 RESR은 효율적인 학습을 위한 사용자의 잠재 임베딩 및 사용자 선택 모형 학습 방법과 더불어 랜덤화된 여러 개의 Q 함수를 샘플링하여 사후분포를 근사하는 방법을 활용한다. 온라인 시뮬레이션 실험에서 알고리듬의 성능을 비교·분석한 결과 제시된 방법이 탐색 효율성 측면에서 나은 성능을 보이는 것을 확인할 수 있었다.	-
dc.description.tableofcontents	1 Introduction 1 2 Related Works 4 3 Problem Statement 6 4 Method 9 4.1 Tractable Decomposition of Action Space 9 4.2 Latent User Representation 10 4.3 User Choice Model 10 4.4 Exploration via Randomized Q-Functions 13 5 Experiments 18 5.1 Online Simulation Environment 18 5.2 User Arrival and Departure 18 5.3 Fully Simulated Recommendation 19 5.4 Simulation using the Real-World Data 20 5.5 Results 22 5.5.1 Fully Simulated Recommendation 22 5.5.2 Simulation using the Real-World Data 24 6 Conclusion 26 A Appendix 27 A.1 Notation 27 A.2 Details of the Experiment 30 A.2.1 Fully Simulated Recommendation 30 A.2.2 Simulation using the Real-World Data 34 A.3 Algorithm 37 Bibliography 38 Abstract in Korean 43	-
dc.format.extent	iii, 43	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Deep RL	-
dc.subject	Recommender System	-
dc.subject	Exploration	-
dc.subject	Simulation	-
dc.subject	POMDP	-
dc.subject.ddc	005	-
dc.title	Efficient Exploration in Reinforcement Learning for Online Slate Recommender System	-
dc.title.alternative	강화학습 기반 온라인 슬레이트 추천 시스템에서의 효율적 탐색 방법	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Seung Joon Park	-
dc.contributor.department	데이터사이언스대학원 데이터사이언스학과	-
dc.description.degree	석사	-
dc.date.awarded	2023-02	-
dc.identifier.uci	I804:11032-000000176412	-
dc.identifier.holdings	000000000049▲000000000056▲000000176412▲	-

Appears in Collections:

Graduate School of Data Science (데이터사이언스 대학원)
- Theses (Master's Degree_데이터사이언스학과)

Files in This Item:

000000176412.pdf 3.08 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share