RL-MSA: a Reinforcement Learning-based Multi-line bus Scheduling
Approach
- URL: http://arxiv.org/abs/2403.06466v1
- Date: Mon, 11 Mar 2024 07:07:05 GMT
- Title: RL-MSA: a Reinforcement Learning-based Multi-line bus Scheduling
Approach
- Authors: Yingzhuo Liu
- Abstract summary: Existing approaches typically generate a bus scheduling scheme in an offline manner and then schedule buses according to the scheme.
In this paper, MLBSP is modeled as a Markov Decision Process (MDP)
A Reinforcement Learning-based Multi-line bus Scheduling Approach (RL-MSA) is proposed for bus scheduling at both the offline and online phases.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiple Line Bus Scheduling Problem (MLBSP) is vital to save operational
cost of bus company and guarantee service quality for passengers. Existing
approaches typically generate a bus scheduling scheme in an offline manner and
then schedule buses according to the scheme. In practice, uncertain events such
as traffic congestion occur frequently, which may make the pre-determined bus
scheduling scheme infeasible. In this paper, MLBSP is modeled as a Markov
Decision Process (MDP). A Reinforcement Learning-based Multi-line bus
Scheduling Approach (RL-MSA) is proposed for bus scheduling at both the offline
and online phases. At the offline phase, deadhead decision is integrated into
bus selection decision for the first time to simplify the learning problem. At
the online phase, deadhead decision is made through a time window mechanism
based on the policy learned at the offline phase. We develop several new and
useful state features including the features for control points, bus lines and
buses. A bus priority screening mechanism is invented to construct bus-related
features. Considering the interests of both the bus company and passengers, a
reward function combining the final reward and the step-wise reward is devised.
Experiments at the offline phase demonstrate that the number of buses used of
RL-MSA is decreased compared with offline optimization approaches. At the
online phase, RL-MSA can cover all departure times in a timetable (i.e.,
service quality) without increasing the number of buses used (i.e., operational
cost).
Related papers
- ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - Real-Time Bus Arrival Prediction: A Deep Learning Approach for Enhanced
Urban Mobility [2.1374208474242815]
A prevalent challenge is the mismatch between actual bus arrival times and their scheduled counterparts, leading to disruptions in fixed schedules.
This research introduces an innovative, AI-based, data-driven methodology for predicting bus arrival times at various transit points (stations)
Through the deployment of a fully connected neural network, our method elevates the accuracy and efficiency of public bus transit systems.
arXiv Detail & Related papers (2023-03-27T16:45:22Z) - Offline Vehicle Routing Problem with Online Bookings: A Novel Problem
Formulation with Applications to Paratransit [5.8521525578624916]
We introduce a novel formulation of the offline vehicle routing problem with online bookings.
This problem is very challenging computationally since it faces the complexity of considering large sets of requests.
We propose a novel computational approach, which combines an anytime algorithm with a learning-based policy for real-time decisions.
arXiv Detail & Related papers (2022-04-25T23:17:34Z) - AI-aided Traffic Control Scheme for M2M Communications in the Internet
of Vehicles [61.21359293642559]
The dynamics of traffic and the heterogeneous requirements of different IoV applications are not considered in most existing studies.
We consider a hybrid traffic control scheme and use proximal policy optimization (PPO) method to tackle it.
arXiv Detail & Related papers (2022-03-05T10:54:05Z) - Design-Bench: Benchmarks for Data-Driven Offline Model-Based
Optimization [82.02008764719896]
Black-box model-based optimization problems are ubiquitous in a wide range of domains, such as the design of proteins, DNA sequences, aircraft, and robots.
We present Design-Bench, a benchmark for offline MBO with a unified evaluation protocol and reference implementations of recent methods.
Our benchmark includes a suite of diverse and realistic tasks derived from real-world optimization problems in biology, materials science, and robotics.
arXiv Detail & Related papers (2022-02-17T05:33:27Z) - Visual Learning-based Planning for Continuous High-Dimensional POMDPs [81.16442127503517]
Visual Tree Search (VTS) is a learning and planning procedure that combines generative models learned offline with online model-based POMDP planning.
VTS bridges offline model training and online planning by utilizing a set of deep generative observation models to predict and evaluate the likelihood of image observations in a Monte Carlo tree search planner.
We show that VTS is robust to different observation noises and, since it utilizes online, model-based planning, can adapt to different reward structures without the need to re-train.
arXiv Detail & Related papers (2021-12-17T11:53:31Z) - Robust Dynamic Bus Control: A Distributional Multi-agent Reinforcement
Learning Approach [11.168121941015013]
Bus bunching is a common phenomenon that undermines the efficiency and reliability of bus systems.
We develop a distributional MARL framework -- IQNC-M -- to learn continuous control.
Our results show that the proposed IQNC-M framework can effectively handle the various extreme events.
arXiv Detail & Related papers (2021-11-02T23:41:09Z) - Deep Reinforcement Learning based Dynamic Optimization of Bus Timetable [4.337939117851783]
We propose a Deep Reinforcement Learning based bus Timetable dynamic Optimization method (DRL-TO)
A Deep Q-Network (DQN) is employed as the decision model to determine whether to dispatch a bus service during each minute of the service period.
DRL-TO can dynamically determine the departure intervals based on the real-time passenger flow, saving 8$%$ of vehicles and reducing 17$%$ of passengers' waiting time on average.
arXiv Detail & Related papers (2021-07-15T01:22:49Z) - Offline-to-Online Reinforcement Learning via Balanced Replay and
Pessimistic Q-Ensemble [135.6115462399788]
Deep offline reinforcement learning has made it possible to train strong robotic agents from offline datasets.
State-action distribution shift may lead to severe bootstrap error during fine-tuning.
We propose a balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples.
arXiv Detail & Related papers (2021-07-01T16:26:54Z) - Reducing Bus Bunching with Asynchronous Multi-Agent Reinforcement
Learning [11.168121941015013]
Bus bunching is a common phenomenon that undermines the reliability and efficiency of bus services.
We formulate route-level bus fleet control as an asynchronous multi-agent reinforcement learning problem.
We extend the classical actor-critic architecture to handle the asynchronous issue.
arXiv Detail & Related papers (2021-05-02T02:08:07Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.