AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free
Deep Reinforcement Learning and Change Point Detection
- URL: http://arxiv.org/abs/2104.00203v1
- Date: Thu, 1 Apr 2021 02:14:01 GMT
- Title: AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free
Deep Reinforcement Learning and Change Point Detection
- Authors: Marina Haliem, Vaneet Aggarwal and Bharat Bhargava
- Abstract summary: This paper introduces an adaptive model-free deep reinforcement approach that can recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling.
In addition to the adaptation logic in dispatching, this paper also proposes a dynamic, demand-aware vehicle-passenger matching and route planning framework.
- Score: 34.77250498401055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces an adaptive model-free deep reinforcement approach that
can recognize and adapt to the diurnal patterns in the ride-sharing environment
with car-pooling. Deep Reinforcement Learning (RL) suffers from catastrophic
forgetting due to being agnostic to the timescale of changes in the
distribution of experiences. Although RL algorithms are guaranteed to converge
to optimal policies in Markov decision processes (MDPs), this only holds in the
presence of static environments. However, this assumption is very restrictive.
In many real-world problems like ride-sharing, traffic control, etc., we are
dealing with highly dynamic environments, where RL methods yield only
sub-optimal decisions. To mitigate this problem in highly dynamic environments,
we (1) adopt an online Dirichlet change point detection (ODCP) algorithm to
detect the changes in the distribution of experiences, (2) develop a Deep Q
Network (DQN) agent that is capable of recognizing diurnal patterns and making
informed dispatching decisions according to the changes in the underlying
environment. Rather than fixing patterns by time of week, the proposed approach
automatically detects that the MDP has changed, and uses the results of the new
model. In addition to the adaptation logic in dispatching, this paper also
proposes a dynamic, demand-aware vehicle-passenger matching and route planning
framework that dynamically generates optimal routes for each vehicle based on
online demand, vehicle capacities, and locations. Evaluation on New York City
Taxi public dataset shows the effectiveness of our approach in improving the
fleet utilization, where less than 50% of the fleet are utilized to serve the
demand of up to 90% of the requests, while maximizing profits and minimizing
idle times.
Related papers
- End-to-end Driving in High-Interaction Traffic Scenarios with Reinforcement Learning [24.578178308010912]
We propose an end-to-end model-based RL algorithm named Ramble to address these issues.
By learning a dynamics model of the environment, Ramble can foresee upcoming traffic events and make more informed, strategic decisions.
Ramble achieves state-of-the-art performance regarding route completion rate and driving score on the CARLA Leaderboard 2.0, showcasing its effectiveness in managing complex and dynamic traffic situations.
arXiv Detail & Related papers (2024-10-03T06:45:59Z) - A Graph-based Adversarial Imitation Learning Framework for Reliable & Realtime Fleet Scheduling in Urban Air Mobility [5.19664437943693]
This paper presents a comprehensive optimization formulation of the fleet scheduling problem.
It also identifies the need for alternate solution approaches.
The new imitative approach achieves better mean performance and remarkable improvement in the case of unseen worst-case scenarios.
arXiv Detail & Related papers (2024-07-16T18:51:24Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with
Online Learning [60.17407932691429]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability.
We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments.
We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z) - Integrated Decision and Control for High-Level Automated Vehicles by
Mixed Policy Gradient and Its Experiment Verification [10.393343763237452]
This paper presents a self-evolving decision-making system based on the Integrated Decision and Control (IDC)
An RL algorithm called constrained mixed policy gradient (CMPG) is proposed to consistently upgrade the driving policy of the IDC.
Experiment results show that boosting by data, the system can achieve better driving ability over model-based methods.
arXiv Detail & Related papers (2022-10-19T14:58:41Z) - Off-line approximate dynamic programming for the vehicle routing problem
with stochastic customers and demands via decentralized decision-making [0.0]
This paper studies a variant of the vehicle routing problem (VRP) where both customer locations and demands are uncertain.
The objective is to maximize the served demands while fulfilling vehicle capacities and time restrictions.
We develop a Q-learning algorithm featuring state-of-the-art acceleration techniques such as Replay Memory and Double Q Network.
arXiv Detail & Related papers (2021-09-21T14:28:09Z) - MetaVIM: Meta Variationally Intrinsic Motivated Reinforcement Learning for Decentralized Traffic Signal Control [54.162449208797334]
Traffic signal control aims to coordinate traffic signals across intersections to improve the traffic efficiency of a district or a city.
Deep reinforcement learning (RL) has been applied to traffic signal control recently and demonstrated promising performance where each traffic signal is regarded as an agent.
We propose a novel Meta Variationally Intrinsic Motivated (MetaVIM) RL method to learn the decentralized policy for each intersection that considers neighbor information in a latent way.
arXiv Detail & Related papers (2021-01-04T03:06:08Z) - Meta Reinforcement Learning-Based Lane Change Strategy for Autonomous
Vehicles [11.180588185127892]
Supervised learning algorithms can generalize to new environments by training on a large amount of labeled data.
It can be often impractical or cost-prohibitive to obtain sufficient data for each new environment.
We propose a meta reinforcement learning (MRL) method to improve the agent's generalization capabilities.
arXiv Detail & Related papers (2020-08-28T02:57:11Z) - Meta-Reinforcement Learning for Trajectory Design in Wireless UAV
Networks [151.65541208130995]
A drone base station (DBS) is dispatched to provide uplink connectivity to ground users whose demand is dynamic and unpredictable.
In this case, the DBS's trajectory must be adaptively adjusted to satisfy the dynamic user access requests.
A meta-learning algorithm is proposed in order to adapt the DBS's trajectory when it encounters novel environments.
arXiv Detail & Related papers (2020-05-25T20:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.