Learning Linear-Quadratic Regulators via Thompson Sampling with Preconditioned Langevin Dynamics

김기훈

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Learning Linear-Quadratic Regulators via Thompson Sampling with Preconditioned Langevin Dynamics : 사전 조건화된 랑주뱅 동역학을 결합한 톰슨 샘플링을 통한 선형 2차 제어기 학습

DC Field	Value	Language
dc.contributor.advisor	양인순	-
dc.contributor.author	김기훈	-
dc.date.accessioned	2023-11-20T04:22:01Z	-
dc.date.available	2023-11-20T04:22:01Z	-
dc.date.issued	2023-08	-
dc.identifier.other	000000178190	-
dc.identifier.uri	https://hdl.handle.net/10371/196438	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000178190	ko_KR
dc.description	학위논문(석사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2023. 8. 양인순.	-
dc.description.abstract	톰슨 샘플링(Thompson sampling)은 온라인 학습 문제에서 탐색과 활용 사이의 균형을 맞추는 데 널리 사용되는 방법으로, 이에는 선형 이차 제어기 (Linear Quadratic Regulator)를 위한 강화학습을 포함한다.그러나선형 이차 제어기 학습에 사용되는 톰슨 샘플링의 이론적 분석은 종종 가우시안 잡음의 경우에만 제한되는 경우가 많다. 또한, 우리는 알려진 시스템 파라미터가 미리 지정된 한정된 집합에 속한다는 가정을 더할 때 샘플링을 직접 수행할 수 있으며, 이는 제한적인 것으로 보인다. 이에 우리는 선형 이차 제어기를 위한 새로운 톰슨 샘플링 알고리즘을 제안하며, 비가우시안 잡음을 포함한 더 넓은 범위의 문제를 다루기 위해 랑주뱅 동역학(Langevin dynamics)를 활용하려 한다.또한, 특정 초기화 방법이나 실제 시스템 파라미터에 대한 정보를 필요로 하지 않으면서도, 사전 분포와 허용 가능한 집합에 대한 최소한의 가정만으로 우리의 알고리즘은 근사 사후 분포로부터 빠르게 샘플링 할 수 있다. 우리 알고리즘은 제곱근 T-스케일의 기대 후회(regret) 상한을 가지며, 이는 이전 연구들의 알고리즘 성능보다 개선된 결과이다. 또한, 우리의 알고리즘 성능 분석은 자기 정규화 기법과 함께 사전 조건화된 랑주뱅 동역학의 수렴 부등식을 활용한다. 우리 알고리즘의 성능은 수치 실험을 통해 입증되었다.	-
dc.description.abstract	Thompson sampling (TS) is a widely used approach for addressing the exploration-exploitation trade-off in online learning problems, including reinforcement learning for linear quadratic regulators (LQR). However, in TS for learning LQR, its theoretical analysis is often limited to the case of Gaussian noises. The sampling can be performed directly when we further assume that the unknown system parameters lie in a prespecified compact set, which is seemingly restrictive. We propose a new TS algorithm for LQR, exploiting Langevin dynamics to handle a larger class of problems including those with non-Gaussian noises. The notion of the preconditioner is introduced to generate samples from non-conjugate posterior distributions. Our algorithm is capable of sampling parameters from approximate posteriors quickly. It attains square root T-scale expected regret bound slightly improving the previous results under the minimal assumption on the prior distribution and admissible set requiring neither a particular initialization technique nor information on the true system parameter. Our regret analysis leverages a nontrivial concentration inequality for the preconditioned Langevin algorithm together with self-normalization techniques. The performance of our algorithm has been demonstrated through numerical experiments as well.	-
dc.description.tableofcontents	Chapter 1 Introduction 4 1.1 Contributions 5 1.2 Related work 6 Chapter 2 Preliminaries 8 2.1 Linear-Quadratic Regulators 8 2.2 Online learning of LQR 10 2.3 Thompson sampling 11 2.4 The Unadjusted Langevin algorithm (ULA) 12 Chapter 3 Learning algorithm 14 3.1 Preconditioned ULA for approximate posterior sampling 14 3.2 Main Algorithm 16 Chapter 4 Concentration properties 21 4.1 Comparing exact and approximate posteriors 21 4.2 Bounding expected state norms by a polynomial of time 23 4.2.1 Concentration of exact and approximate posterior distributions around θ∗ 24 Chapter 5 Main result 27 5.1 Improved state bound for the second and fourth moments 27 5.2 Regret bound 28 Chapter 6 Experiment 30 6.1 Experimental setup 30 6.1.1 Gaussian mixture noise 31 6.1.2 Asymmetric noise 32 6.2 Performance of our algorithm 33 6.3 Effect of preconditioner on number iterations 35 Chapter 7 Conclusion 36 Bibliography 37 Appendix A Proof of Theorem 2 44 Appendix B Proof of Lemma 1 50 Appendix C Details for Section 4.1 51 C.1 Proof of Proposition 1 51 C.2 Proof of Proposition 2 58 Appendix D Details for Theorem 3 64 Appendix E Details for Section 4.2.1 73 Appendix F Miscellaneous Lemmas 86 Appendix G Details for Section 5 93 G.0.1 Proof of Theorem 5 93 G.0.2 Proof of Theorem 6 95 Abstract (In Korean) 102 Acknowlegement 103	-
dc.format.extent	i,103	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Linear quadratic regulator	-
dc.subject	Thompson sampling	-
dc.subject	Langevin dynamics	-
dc.subject.ddc	621.3	-
dc.title	Learning Linear-Quadratic Regulators via Thompson Sampling with Preconditioned Langevin Dynamics	-
dc.title.alternative	사전 조건화된 랑주뱅 동역학을 결합한 톰슨 샘플링을 통한 선형 2차 제어기 학습	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Gihun Kim	-
dc.contributor.department	공과대학 전기·정보공학부	-
dc.description.degree	석사	-
dc.date.awarded	2023-08	-
dc.identifier.uci	I804:11032-000000178190	-
dc.identifier.holdings	000000000050▲000000000058▲000000178190▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Files in This Item:

000000178190.pdf 3.04 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share