Dual Policy Reinforcement Learning for Real-time Rebalancing in Bike-sharing Systems
- URL: http://arxiv.org/abs/2406.00868v1
- Date: Sun, 2 Jun 2024 21:05:23 GMT
- Title: Dual Policy Reinforcement Learning for Real-time Rebalancing in Bike-sharing Systems
- Authors: Jiaqi Liang, Defeng Liu, Sanjay Dominik Jena, Andrea Lodi, Thibaut Vidal,
- Abstract summary: Bike-sharing systems play a crucial role in easing traffic congestion and promoting healthier lifestyles.
This study introduces a novel approach to address the real-time rebalancing problem with a fleet of vehicles.
It employs a dual policy reinforcement learning algorithm that decouples inventory and routing decisions.
- Score: 13.083156894368532
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Bike-sharing systems play a crucial role in easing traffic congestion and promoting healthier lifestyles. However, ensuring their reliability and user acceptance requires effective strategies for rebalancing bikes. This study introduces a novel approach to address the real-time rebalancing problem with a fleet of vehicles. It employs a dual policy reinforcement learning algorithm that decouples inventory and routing decisions, enhancing realism and efficiency compared to previous methods where both decisions were made simultaneously. We first formulate the inventory and routing subproblems as a multi-agent Markov Decision Process within a continuous time framework. Subsequently, we propose a DQN-based dual policy framework to jointly estimate the value functions, minimizing the lost demand. To facilitate learning, a comprehensive simulator is applied to operate under a first-arrive-first-serve rule, which enables the computation of immediate rewards across diverse demand scenarios. We conduct extensive experiments on various datasets generated from historical real-world data, affected by both temporal and weather factors. Our proposed algorithm demonstrates significant performance improvements over previous baseline methods. It offers valuable practical insights for operators and further explores the incorporation of reinforcement learning into real-world dynamic programming problems, paving the way for more intelligent and robust urban mobility solutions.
Related papers
- Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations.
We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games.
Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - A Reinforcement Learning Approach for Dynamic Rebalancing in
Bike-Sharing System [11.237099288412558]
Bike-Sharing Systems provide eco-friendly urban mobility, contributing to the alleviation of traffic congestion and healthier lifestyles.
Devising effective rebalancing strategies using vehicles to redistribute bikes among stations is therefore of uttermost importance for operators.
This paper introduces atemporal reinforcement learning algorithm for the dynamic rebalancing problem with multiple vehicles.
arXiv Detail & Related papers (2024-02-05T23:46:42Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Combinatorial Optimization enriched Machine Learning to solve the
Dynamic Vehicle Routing Problem with Time Windows [5.4807970361321585]
We propose a novel machine learning pipeline that incorporates an optimization layer.
We apply this pipeline to a dynamic vehicle routing problem with waves, which was recently promoted in the EURO Meets NeurIPS Competition at NeurIPS 2022.
Our methodology ranked first in this competition, outperforming all other approaches in solving the proposed dynamic vehicle routing problem.
arXiv Detail & Related papers (2023-04-03T08:23:09Z) - NeurIPS 2022 Competition: Driving SMARTS [60.948652154552136]
Driving SMARTS is a regular competition designed to tackle problems caused by the distribution shift in dynamic interaction contexts.
The proposed competition supports methodologically diverse solutions, such as reinforcement learning (RL) and offline learning methods.
arXiv Detail & Related papers (2022-11-14T17:10:53Z) - UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning
Leveraging Planning [1.1339580074756188]
Offline reinforcement learning (RL) provides a framework for learning decision-making from offline data.
Self-driving vehicles (SDV) learn a policy, which potentially even outperforms the behavior in the sub-optimal data set.
This motivates the use of model-based offline RL approaches, which leverage planning.
arXiv Detail & Related papers (2021-11-22T10:37:52Z) - Value Function is All You Need: A Unified Learning Framework for Ride
Hailing Platforms [57.21078336887961]
Large ride-hailing platforms, such as DiDi, Uber and Lyft, connect tens of thousands of vehicles in a city to millions of ride demands throughout the day.
We propose a unified value-based dynamic learning framework (V1D3) for tackling both tasks.
arXiv Detail & Related papers (2021-05-18T19:22:24Z) - Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement
Learning [52.2663102239029]
We present a new practical framework based on deep reinforcement learning and decision-time planning for real-world vehicle on idle-hailing platforms.
Our approach learns ride-based state-value function using a batch training algorithm with deep value.
We benchmark our algorithm with baselines in a ride-hailing simulation environment to demonstrate its superiority in improving income efficiency.
arXiv Detail & Related papers (2021-03-08T05:34:05Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Efficiency and Equity are Both Essential: A Generalized Traffic Signal
Controller with Deep Reinforcement Learning [25.21831641893209]
We present an approach to learning policies for signal controllers using deep reinforcement learning aiming for optimized traffic flow.
Our method uses a novel formulation of the reward function that simultaneously considers efficiency and equity.
The experimental evaluations on both simulated and real-world data demonstrate that our proposed algorithm achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-09T11:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.