Adaptive Matching Time Intervals based on Reinforcement Learning for Ride-Hailing Services

Abstract: Ride-hailing services helped daily travel by efficiently matching passengers and drivers. These services face inefficiency in system operations due to supply and demand imbalances. A widely adopted strategy is fixed batch-based matching, which accumulates requests and idle drivers and matches them in batches. Recent studies have proposed adaptive matching time intervals to consider dynamic supply and demand patterns. However, matching failure factors such as passenger request cancellation and driver acceptance are not considered. This study aims to control adaptive matching time intervals based on reinforcement learning considering matching failure factors. To this end, we propose a two-step framework to maximize the matching success rate. First, an agent based on Deep Q-Network (DQN) determines the matching time interval, and then combinatorial optimization is performed based on the driver's acceptance probability. We conduct experiments on various supply-demand patterns based on synthetic and real datasets and compare performance with previous strategies. We confirmed that the proposed strategy reduces the proportion of expired requests and achieves the highest matching success rate. We also discussed the trade-off between fixed matching time intervals and matching success rates and interpreted agent policies. Our approach provides insight by discussing matching failure factors, which cannot be captured with performance alone.
승차 공유 서비스들은 승객과 운전자들을 효율적으로 연결함으로써 일상 생활의 이동에 많은 도움을 주고 있다. 이러한 서비스들은 수요와 공급의 불균형 문제로 인해 시스템 운영 측면에서 비효율적인 상황에 직면한다. 이를 위해 일정한 매칭 시간 간격 동안 승객의 요청과 공차 통행 중인 운전자들을 모아 일괄적으로 매칭하는 전략을 주로 사용한다. 최근에는 수요와 공급의 동적 패턴을 효과적으로 반영하기 위한 적응형 매칭 시간 간격에 대한 연구들이 있었으나, 승객의 요청 취소와 운전자 거부와 같은 매칭 실패 요인들은 간과되었다. 본 연구의 목표는 매칭 실패 요인이 존재하는 상황에서 강화학습 기반의 적응형 매칭 시간 간격을 통해 매칭 성공률을 최대화하는 것이다. 연구 방법은 2단계 프레임워크로 구성된다. 먼저 DQN (Deep Q-Network) 기반의 강화학습 에이전트는 각 매칭 시간 간격마다 배차 행동(Dispatch action)을 결정하며, 이후에는 운전자의 수락확률을 기반으로 한 조합최적화가 수행된다. 실제 데이터셋을 기반으로 한 실험을 통해 이전 전략들과 성능을 비교하고 매칭 실패 요인들에 대한 분석을 수행한다. 실험 결과, 제안된 방법은 대부분의 실험에서 가장 높은 매칭 성공률을 보였다. 구체적으로는 운전자의 미 수락에 의한 만료 요청의 비율을 감소시키며, 승객의 요청 취소 비율을 효율적으로 제어하는 것을 확인했다. 또한 학습된 에이전트의 정책 해석과 집계된 결과의 세분화를 기반으로 추가 분석이 수행되었다. 이러한 접근 방식은 매칭 성공률과 세부적인 매칭 실패 요인들에 대한 논의를 통해 기존 연구에서 간과되었던 통찰력을 제공한다.

Language: eng

URI: https://hdl.handle.net/10371/193008

https://dcollection.snu.ac.kr/common/orgView/000000175951

Files in This Item:

000000175951.pdf 1.53 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Civil & Environmental Engineering (건설환경공학부)
  - Theses (Master's Degree_건설환경공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share