Related papers: AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection

AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection

URL: http://arxiv.org/abs/2104.00203v1
Date: Thu, 1 Apr 2021 02:14:01 GMT
Title: AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection
Authors: Marina Haliem, Vaneet Aggarwal and Bharat Bhargava
Abstract summary: This paper introduces an adaptive model-free deep reinforcement approach that can recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling. In addition to the adaptation logic in dispatching, this paper also proposes a dynamic, demand-aware vehicle-passenger matching and route planning framework.
Score: 34.77250498401055
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces an adaptive model-free deep reinforcement approach that can recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling. Deep Reinforcement Learning (RL) suffers from catastrophic forgetting due to being agnostic to the timescale of changes in the distribution of experiences. Although RL algorithms are guaranteed to converge to optimal policies in Markov decision processes (MDPs), this only holds in the presence of static environments. However, this assumption is very restrictive. In many real-world problems like ride-sharing, traffic control, etc., we are dealing with highly dynamic environments, where RL methods yield only sub-optimal decisions. To mitigate this problem in highly dynamic environments, we (1) adopt an online Dirichlet change point detection (ODCP) algorithm to detect the changes in the distribution of experiences, (2) develop a Deep Q Network (DQN) agent that is capable of recognizing diurnal patterns and making informed dispatching decisions according to the changes in the underlying environment. Rather than fixing patterns by time of week, the proposed approach automatically detects that the MDP has changed, and uses the results of the new model. In addition to the adaptation logic in dispatching, this paper also proposes a dynamic, demand-aware vehicle-passenger matching and route planning framework that dynamically generates optimal routes for each vehicle based on online demand, vehicle capacities, and locations. Evaluation on New York City Taxi public dataset shows the effectiveness of our approach in improving the fleet utilization, where less than 50% of the fleet are utilized to serve the demand of up to 90% of the requests, while maximizing profits and minimizing idle times.

Related papers

World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks [53.98633183204453]
In this paper, a novel world model-based learning framework is proposed to minimize packet-completeness-aware age of information (CAoI) in a vehicular network.<n>A world model framework is proposed to jointly learn a dynamic model of the mmWave V2X environment and use it to imagine trajectories for learning how to perform link scheduling.<n>In particular, the long-term policy is learned in differentiable imagined trajectories instead of environment interactions.
arXiv Detail & Related papers (2025-05-03T06:23:18Z)
Enhancing Traffic Signal Control through Model-based Reinforcement Learning and Policy Reuse [0.9995933996287355]
Multi-agent reinforcement learning (MARL) has shown significant potential in traffic signal control (TSC) Current MARL-based methods often suffer from insufficient generalization due to the fixed traffic patterns and road network conditions used during training. This limitation results in poor adaptability to new traffic scenarios, leading to high retraining costs and complex deployment. We propose two algorithms: PLight and PRLight. PLight employs a model-based reinforcement learning approach, pretraining control policies and environment models using predefined source-domain traffic scenarios. PRLight further enhances adaptability by adaptively selecting pre-trained PLight agents based on the similarity between
arXiv Detail & Related papers (2025-03-11T01:21:13Z)
End-to-end Driving in High-Interaction Traffic Scenarios with Reinforcement Learning [24.578178308010912]
We propose an end-to-end model-based RL algorithm named Ramble to address these issues. By learning a dynamics model of the environment, Ramble can foresee upcoming traffic events and make more informed, strategic decisions. Ramble achieves state-of-the-art performance regarding route completion rate and driving score on the CARLA Leaderboard 2.0, showcasing its effectiveness in managing complex and dynamic traffic situations.
arXiv Detail & Related papers (2024-10-03T06:45:59Z)
A Graph-based Adversarial Imitation Learning Framework for Reliable & Realtime Fleet Scheduling in Urban Air Mobility [5.19664437943693]
This paper presents a comprehensive optimization formulation of the fleet scheduling problem. It also identifies the need for alternate solution approaches. The new imitative approach achieves better mean performance and remarkable improvement in the case of unseen worst-case scenarios.
arXiv Detail & Related papers (2024-07-16T18:51:24Z)
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations. Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z)
Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z)
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments. Our approach enhances LiDAR-based detection models using spatial quantized historical features. Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z)
Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online Learning [60.17407932691429]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability. We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments. We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z)
Integrated Decision and Control for High-Level Automated Vehicles by Mixed Policy Gradient and Its Experiment Verification [10.393343763237452]
This paper presents a self-evolving decision-making system based on the Integrated Decision and Control (IDC) An RL algorithm called constrained mixed policy gradient (CMPG) is proposed to consistently upgrade the driving policy of the IDC. Experiment results show that boosting by data, the system can achieve better driving ability over model-based methods.
arXiv Detail & Related papers (2022-10-19T14:58:41Z)
Off-line approximate dynamic programming for the vehicle routing problem with stochastic customers and demands via decentralized decision-making [0.0]
This paper studies a variant of the vehicle routing problem (VRP) where both customer locations and demands are uncertain. The objective is to maximize the served demands while fulfilling vehicle capacities and time restrictions. We develop a Q-learning algorithm featuring state-of-the-art acceleration techniques such as Replay Memory and Double Q Network.
arXiv Detail & Related papers (2021-09-21T14:28:09Z)
MetaVIM: Meta Variationally Intrinsic Motivated Reinforcement Learning for Decentralized Traffic Signal Control [54.162449208797334]
Traffic signal control aims to coordinate traffic signals across intersections to improve the traffic efficiency of a district or a city. Deep reinforcement learning (RL) has been applied to traffic signal control recently and demonstrated promising performance where each traffic signal is regarded as an agent. We propose a novel Meta Variationally Intrinsic Motivated (MetaVIM) RL method to learn the decentralized policy for each intersection that considers neighbor information in a latent way.
arXiv Detail & Related papers (2021-01-04T03:06:08Z)
Meta Reinforcement Learning-Based Lane Change Strategy for Autonomous Vehicles [11.180588185127892]
Supervised learning algorithms can generalize to new environments by training on a large amount of labeled data. It can be often impractical or cost-prohibitive to obtain sufficient data for each new environment. We propose a meta reinforcement learning (MRL) method to improve the agent's generalization capabilities.
arXiv Detail & Related papers (2020-08-28T02:57:11Z)
Meta-Reinforcement Learning for Trajectory Design in Wireless UAV Networks [151.65541208130995]
A drone base station (DBS) is dispatched to provide uplink connectivity to ground users whose demand is dynamic and unpredictable. In this case, the DBS's trajectory must be adaptively adjusted to satisfy the dynamic user access requests. A meta-learning algorithm is proposed in order to adapt the DBS's trajectory when it encounters novel environments.
arXiv Detail & Related papers (2020-05-25T20:43:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.