Coordinating Ride-Pooling with Public Transit using Reward-Guided Conservative Q-Learning: An Offline Training and Online Fine-Tuning Reinforcement Learning Framework
- URL: http://arxiv.org/abs/2501.14199v1
- Date: Fri, 24 Jan 2025 03:05:04 GMT
- Title: Coordinating Ride-Pooling with Public Transit using Reward-Guided Conservative Q-Learning: An Offline Training and Online Fine-Tuning Reinforcement Learning Framework
- Authors: Yulong Hu, Tingting Dong, Sen Li,
- Abstract summary: This paper introduces a novel reinforcement learning framework, termed Reward-Guided Conservative Q-learning (RG-CQL)<n>We propose an offline training and online fine-tuning RL framework to learn the optimal operational decisions of the multimodal transportation systems.<n>Our innovative offline training and online fine-tuning framework offers a remarkable 81.3% improvement in data efficiency.
- Score: 8.212024590297895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a novel reinforcement learning (RL) framework, termed Reward-Guided Conservative Q-learning (RG-CQL), to enhance coordination between ride-pooling and public transit within a multimodal transportation network. We model each ride-pooling vehicle as an agent governed by a Markov Decision Process (MDP) and propose an offline training and online fine-tuning RL framework to learn the optimal operational decisions of the multimodal transportation systems, including rider-vehicle matching, selection of drop-off locations for passengers, and vehicle routing decisions, with improved data efficiency. During the offline training phase, we develop a Conservative Double Deep Q Network (CDDQN) as the action executor and a supervised learning-based reward estimator, termed the Guider Network, to extract valuable insights into action-reward relationships from data batches. In the online fine-tuning phase, the Guider Network serves as an exploration guide, aiding CDDQN in effectively and conservatively exploring unknown state-action pairs. The efficacy of our algorithm is demonstrated through a realistic case study using real-world data from Manhattan. We show that integrating ride-pooling with public transit outperforms two benchmark cases solo rides coordinated with transit and ride-pooling without transit coordination by 17% and 22% in the achieved system rewards, respectively. Furthermore, our innovative offline training and online fine-tuning framework offers a remarkable 81.3% improvement in data efficiency compared to traditional online RL methods with adequate exploration budgets, with a 4.3% increase in total rewards and a 5.6% reduction in overestimation errors. Experimental results further demonstrate that RG-CQL effectively addresses the challenges of transitioning from offline to online RL in large-scale ride-pooling systems integrated with transit.
Related papers
- UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning [78.86567400365392]
We present Semi-online Reinforcement Learning, a novel paradigm that simulates online RL on offline trajectories.<n>To capture long-term training signals, Semi-online RL introduces discounted future returns into the reward computation.<n>Experiments show that ours Semi-online RL achieves SOTA performance among 7B models across four dynamic benchmarks.
arXiv Detail & Related papers (2025-09-15T03:24:08Z) - Semi-on-Demand Transit Feeders with Shared Autonomous Vehicles and Reinforcement-Learning-Based Zonal Dispatching Control [2.5407305954218797]
This paper develops a semi-on-demand transit feeder service using shared autonomous vehicles (SAVs) and zonal dispatching control based on reinforcement learning (RL)<n>After efficient training of the RL model, the semi-on-demand service with dynamic zonal control serves 16% more passengers at 13% higher generalized costs on average compared to traditional fixed-route service.
arXiv Detail & Related papers (2025-09-02T02:14:11Z) - Reinforcement Learning-based Sequential Route Recommendation for System-Optimal Traffic Assignment [8.598431584462944]
We propose a learning-based framework that reformulates the static SO traffic assignment problem as a single-agent deep reinforcement learning task.<n>We develop an MSA-guided deep Q-learning algorithm that integrates the iterative structure of traditional traffic assignment methods into the RL training process.<n>Results show that the RL agent converges to the theoretical SO solution in the Braess network and achieves only a 0.35% deviation in the OW network.
arXiv Detail & Related papers (2025-05-27T08:33:02Z) - URB -- Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles [0.0]
Reinforcement learning (RL) can facilitate the development of such collective routing strategies.<n>We present our: Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles.<n>Our results suggest that, despite the lengthy and costly training, state-of-the-art MARL algorithms rarely outperformed humans.
arXiv Detail & Related papers (2025-05-23T10:54:53Z) - Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control [15.31488551912888]
Reinforcement Learning (RL) has proven effective in optimizing complex decision-making processes in CACC.
MARL has shown remarkable potential by enabling coordinated actions among multiple CAVs.
We propose Communication-Aware Reinforcement Learning (CA-RL) to address these challenges.
arXiv Detail & Related papers (2024-07-12T03:28:24Z) - Real-time system optimal traffic routing under uncertainties -- Can physics models boost reinforcement learning? [2.298129181817085]
Our paper presents TransRL, a novel algorithm that integrates reinforcement learning with physics models for enhanced performance, reliability, and interpretability.
By leveraging the information from physics models, TransRL consistently outperforms state-of-the-art reinforcement learning algorithms.
arXiv Detail & Related papers (2024-07-10T04:53:26Z) - Short Run Transit Route Planning Decision Support System Using a Deep
Learning-Based Weighted Graph [0.0]
We propose a novel deep learning-based methodology for a decision support system that enables public transport planners to identify short-term route improvements rapidly.
By seamlessly adjusting specific sections of routes between two stops during specific times of the day, our method effectively reduces times and enhances PT services.
Using self-supervision, we train a deep learning model for predicting lateness values for road segments. These lateness values are then utilized as edge weights in the transportation graph, enabling efficient path searching.
arXiv Detail & Related papers (2023-08-24T14:37:55Z) - Using Reinforcement Learning for the Three-Dimensional Loading Capacitated Vehicle Routing Problem [40.50169360761464]
Collaborative vehicle routing has been proposed as a solution to increase efficiency.
Current operations research methods suffer from non-linear scaling with increasing problem size.
We develop a reinforcement learning model to solve the three-dimensional loading capacitated vehicle routing problem in approximately linear time.
arXiv Detail & Related papers (2023-07-22T18:05:28Z) - ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z) - A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching.
We conduct large scale online A/B tests on DiDi's ride-dispatching platform.
Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z) - Value Function is All You Need: A Unified Learning Framework for Ride
Hailing Platforms [57.21078336887961]
Large ride-hailing platforms, such as DiDi, Uber and Lyft, connect tens of thousands of vehicles in a city to millions of ride demands throughout the day.
We propose a unified value-based dynamic learning framework (V1D3) for tackling both tasks.
arXiv Detail & Related papers (2021-05-18T19:22:24Z) - Vehicular Cooperative Perception Through Action Branching and Federated
Reinforcement Learning [101.64598586454571]
A novel framework is proposed to allow reinforcement learning-based vehicular association, resource block (RB) allocation, and content selection of cooperative perception messages (CPMs)
A federated RL approach is introduced in order to speed up the training process across vehicles.
Results show that federated RL improves the training process, where better policies can be achieved within the same amount of time compared to the non-federated approach.
arXiv Detail & Related papers (2020-12-07T02:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.