A Dynamic Rebalancing Strategy in Public Bicycle Sharing Systems Based on Real-Time Dynamic Programming and Reinforcement Learning

서영현

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

A Dynamic Rebalancing Strategy in Public Bicycle Sharing Systems Based on Real-Time Dynamic Programming and Reinforcement Learning : 실시간 동적 계획법 및 강화학습 기반의 공공자전거 시스템의 동적 재배치 전략

DC Field	Value	Language
dc.contributor.advisor	고승영	-
dc.contributor.author	서영현	-
dc.date.accessioned	2020-10-13T02:35:12Z	-
dc.date.available	2020-10-13T02:35:12Z	-
dc.date.issued	2020	-
dc.identifier.other	000000163255	-
dc.identifier.uri	https://hdl.handle.net/10371/169096	-
dc.identifier.uri	http://dcollection.snu.ac.kr/common/orgView/000000163255	ko_KR
dc.description	학위논문 (박사) -- 서울대학교 대학원 : 공과대학 건설환경공학부, 2020. 8. 고승영.	-
dc.description.abstract	The public bicycle sharing system is one of the modes of transportation that can help to relieve several urban problems, such as traffic congestion and air pollution. Because users can pick up and return bicycles anytime and anywhere a station is located, pickup or return failure can occur due to the spatiotemporal imbalances in demand. To prevent system failures, the operator should establish an appropriate repositioning strategy. As the operator makes a decision based on the predicted demand information, the accuracy of forecasting demand is an essential factor. Due to the stochastic nature of demand, however, the occurrence of prediction errors is inevitable. This study develops a stochastic dynamic model that minimizes unmet demand for rebalancing public bicycle sharing systems, taking into account the stochastic demand and the dynamic characteristics of the system. Since the repositioning mechanism corresponds to the sequential decision-making problem, this study applies the Markov decision process to the problem. To solve the Markov decision process, a dynamic programming method, which decomposes complex problems into simple subproblems to derive an exact solution. However, as a set of states and actions of the Markov decision process become more extensive, the computational complexity increases and it is intractable to derive solutions. An approximate dynamic programming method is introduced to derive an approximate solution. Further, a reinforcement learning model is applied to obtain a feasible solution in a large-scale public bicycle network. It is assumed that the predicted demand is derived from the random forest, which is a kind of machine learning technique, and that the observed demand occurred along the Poisson distribution whose mean is the predicted demand to simulate the uncertainty of the future demand. Total unmet demand is used as a key performance indicator in this study. In this study, a repositioning strategy that quickly responds to the prediction error, which means the difference between the observed demand and the predicted demand, is developed and the effectiveness is assessed. Strategies developed in previous studies or applied in the field are also modeled and compared with the results to verify the effectiveness of the strategy. Besides, the effects of various safety buffers and safety stock are examined and appropriate strategies are suggested for each situation. As a result of the analysis, the repositioning effect by the developed strategy was improved compared to the benchmark strategies. In particular, the effect of a strategy focusing on stations with high prediction errors is similar to the effect of a strategy considering all stations, but the computation time can be further reduced. Through this study, the utilization and reliability of the public bicycle system can be improved through the efficient operation without expanding the infrastructure.	-
dc.description.abstract	공공자전거 시스템은 교통혼잡과 대기오염 등 여러 도시문제를 완화할 수 있는 교통수단이다. 대여소가 위치한 곳이면 언제 어디서든 이용자가 자전거를 이용할 수 있는 시스템의 특성상 수요의 시공간적 불균형으로 인해 대여 실패 또는 반납 실패가 발생한다. 시스템 실패를 예방하기 위해 운영자는 적절한 재배치 전략을 수립해야 한다. 운영자는 예측 수요 정보를 전제로 의사결정을 하므로 수요예측의 정확성이 중요한 요소이나, 수요의 불확실성으로 인해 예측 오차의 발생이 불가피하다. 본 연구의 목적은 공공자전거 수요의 불확실성과 시스템의 동적 특성을 고려하여 불만족 수요를 최소화하는 재배치 모형을 개발하는 것이다. 공공자전거 재배치 메커니즘은 순차적 의사결정 문제에 해당하므로, 본 연구에서는 순차적 의사결정 문제를 모형화할 수 있는 마르코프 결정 과정을 적용한다. 마르코프 결정 과정을 풀기 위해 복잡한 문제를 간단한 부문제로 분해하여 정확해를 도출하는 동적 계획법을 이용한다. 하지만 마르코프 결정 과정의 상태 집합과 결정 집합의 크기가 커지면 계산 복잡도가 증가하므로, 동적 계획법을 이용한 정확해를 도출할 수 없다. 이를 해결하기 위해 근사적 동적 계획법을 도입하여 근사해를 도출하며, 대규모 공공자전거 네트워크에서 가능해를 얻기 위해 강화학습 모형을 적용한다. 장래 공공자전거 이용수요의 불확실성을 모사하기 위해, 기계학습 기법의 일종인 random forest로 예측 수요를 도출하고, 예측 수요를 평균으로 하는 포아송 분포를 따라 수요를 확률적으로 발생시켰다. 본 연구에서는 관측 수요와 예측 수요 간의 차이인 예측오차에 빠르게 대응하는 재배치 전략을 개발하고 효과를 평가한다. 개발된 전략의 우수성을 검증하기 위해, 기존 연구의 재배치 전략 및 현실에서 적용되는 전략을 모형화하고 결과를 비교한다. 또한, 재고량의 안전 구간 및 안전재고량에 관한 민감도 분석을 수행하여 함의점을 제시한다. 개발된 전략의 효과를 분석한 결과, 기존 연구의 전략 및 현실에서 적용되는 전략보다 개선된 성능을 보이며, 특히 예측오차가 큰 대여소를 탐색하는 전략이 전체 대여소를 탐색하는 전략과 재배치 효과가 유사하면서도 계산시간을 절감할 수 있는 것으로 나타났다. 공공자전거 인프라를 확대하지 않고도 운영의 효율화를 통해 공공자전거 시스템의 이용률 및 신뢰성을 제고할 수 있고, 공공자전거 재배치에 관한 정책적 함의점을 제시한다는 점에서 본 연구의 의의가 있다.	-
dc.description.tableofcontents	Chapter 1. Introduction １ 1.1 Research Background and Purposes １ 1.2 Research Scope and Procedure ７ Chapter 2. Literature Review １０ 2.1 Vehicle Routing Problems １０ 2.2 Bicycle Repositioning Problem １２ 2.3 Markov Decision Processes ２３ 2.4 Implications and Contributions ２６ Chapter 3. Model Formulation ２８ 3.1 Problem Definition ２８ 3.2 Markov Decision Processes ３４ 3.3 Demand Forecasting ４０ 3.4 Key Performance Indicator (KPI) ４５ Chapter 4. Solution Algorithms ４７ 4.1 Exact Solution Algorithm ４７ 4.2 Approximate Dynamic Programming ５０ 4.3 Reinforcement Learning Method ５２ Chapter 5. Numerical Example ５５ 5.1 Data Overview ５５ 5.2 Experimental Design ６１ 5.3 Algorithm Performance ６６ 5.4 Sensitivity Analysis ７４ 5.5 Large-scale Cases ７６ Chapter 6. Conclusions ８２ 6.1 Conclusions ８２ 6.2 Future Research ８３ References ８６ 초 록 ９２	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Markov Decision Process	-
dc.subject	Public bicycle sharing system	-
dc.subject	Real-time dynamic programming	-
dc.subject	Reinforcement learning	-
dc.subject	Repositioning	-
dc.subject	강화학습	-
dc.subject	공공자전거 시스템	-
dc.subject	마르코프 결정 과정	-
dc.subject	실시간 동적 계획법	-
dc.subject	재배치	-
dc.subject.ddc	624	-
dc.title	A Dynamic Rebalancing Strategy in Public Bicycle Sharing Systems Based on Real-Time Dynamic Programming and Reinforcement Learning	-
dc.title.alternative	실시간 동적 계획법 및 강화학습 기반의 공공자전거 시스템의 동적 재배치 전략	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Young-Hyun Seo	-
dc.contributor.department	공과대학 건설환경공학부	-
dc.description.degree	Doctor	-
dc.date.awarded	2020-08	-
dc.identifier.uci	I804:11032-000000163255	-
dc.identifier.holdings	000000000043▲000000000048▲000000163255▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Civil & Environmental Engineering (건설환경공학부)
  - Theses (Ph.D. / Sc.D._건설환경공학부)

Files in This Item:

000000163255.pdf 3.51 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share