Off-line approximate dynamic programming for the vehicle routing problem
with stochastic customers and demands via decentralized decision-making
- URL: http://arxiv.org/abs/2109.10200v1
- Date: Tue, 21 Sep 2021 14:28:09 GMT
- Title: Off-line approximate dynamic programming for the vehicle routing problem
with stochastic customers and demands via decentralized decision-making
- Authors: Mohsen Dastpak and Fausto Errico
- Abstract summary: This paper studies a variant of the vehicle routing problem (VRP) where both customer locations and demands are uncertain.
The objective is to maximize the served demands while fulfilling vehicle capacities and time restrictions.
We develop a Q-learning algorithm featuring state-of-the-art acceleration techniques such as Replay Memory and Double Q Network.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper studies a stochastic variant of the vehicle routing problem (VRP)
where both customer locations and demands are uncertain. In particular,
potential customers are not restricted to a predefined customer set but are
continuously spatially distributed in a given service area. The objective is to
maximize the served demands while fulfilling vehicle capacities and time
restrictions. We call this problem the VRP with stochastic customers and
demands (VRPSCD). For this problem, we first propose a Markov Decision Process
(MDP) formulation representing the classical centralized decision-making
perspective where one decision-maker establishes the routes of all vehicles.
While the resulting formulation turns out to be intractable, it provides us
with the ground to develop a new MDP formulation of the VRPSCD representing a
decentralized decision-making framework, where vehicles autonomously establish
their own routes. This new formulation allows us to develop several strategies
to reduce the dimension of the state and action spaces, resulting in a
considerably more tractable problem. We solve the decentralized problem via
Reinforcement Learning, and in particular, we develop a Q-learning algorithm
featuring state-of-the-art acceleration techniques such as Replay Memory and
Double Q Network. Computational results show that our method considerably
outperforms two commonly adopted benchmark policies (random and heuristic).
Moreover, when comparing with existing literature, we show that our approach
can compete with specialized methods developed for the particular case of the
VRPSCD where customer locations and expected demands are known in advance.
Finally, we show that the value functions and policies obtained by our
algorithm can be easily embedded in Rollout algorithms, thus further improving
their performances.
Related papers
- Dynamic Demand Management for Parcel Lockers [0.0]
We develop a solution framework that orchestrates algorithmic techniques rooted in Sequential Decision Analytics and Reinforcement Learning.
Our innovative approach to combine these techniques enables us to address the strong interrelations between the two decision types.
Our computational study shows that our method outperforms a myopic benchmark by 13.7% and an industry-inspired policy by 12.6%.
arXiv Detail & Related papers (2024-09-08T11:38:48Z) - Decentralized Semantic Traffic Control in AVs Using RL and DQN for Dynamic Roadblocks [9.485363025495225]
We present a novel semantic traffic control system that entrusts semantic encoding responsibilities to the vehicles themselves.
This system processes driving decisions obtained from a Reinforcement Learning (RL) agent, streamlining the decision-making process.
arXiv Detail & Related papers (2024-06-26T20:12:48Z) - Genetic Algorithms with Neural Cost Predictor for Solving Hierarchical Vehicle Routing Problems [20.684353068460375]
When vehicle routing decisions are intertwined with higher-level decisions, the resulting optimization problems pose significant challenges for computation.
We propose a novel deep-learning-based approach called Genetic Algorithm with Neural Cost Predictor (GANCP) to tackle the challenge.
In particular, our proposed neural network learns the objective values of the HGS-CVRP open-source package that solves capacitated vehicle routing problems.
arXiv Detail & Related papers (2023-10-22T02:46:37Z) - No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand
Distribution [48.27759561064771]
We consider the two-echelon supply chain model introduced in [Cachon and Zipkin, 1999] under two different settings.
We design algorithms that achieve favorable guarantees for both regret and convergence to the optimal inventory decision in both settings.
Our algorithms are based on Online Gradient Descent and Online Newton Step, together with several new ingredients specifically designed for our problem.
arXiv Detail & Related papers (2022-10-23T08:45:39Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - SABER: Data-Driven Motion Planner for Autonomously Navigating
Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal.
We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints.
recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution.
A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z) - H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi
Dispatch [9.35511513240868]
H-TD2 is a model-free, adaptive decision-making algorithm to coordinate a large fleet of automated taxis in a dynamic urban environment.
We derive a regret bound and design the trigger condition between the two behaviors to explicitly control the trade-off between computational complexity and the individual taxi policy's bounded sub-optimality.
Unlike recent reinforcement learning dispatch methods, this policy estimation is adaptive and robust to out-of-training domain events.
arXiv Detail & Related papers (2021-05-05T15:42:31Z) - AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free
Deep Reinforcement Learning and Change Point Detection [34.77250498401055]
This paper introduces an adaptive model-free deep reinforcement approach that can recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling.
In addition to the adaptation logic in dispatching, this paper also proposes a dynamic, demand-aware vehicle-passenger matching and route planning framework.
arXiv Detail & Related papers (2021-04-01T02:14:01Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z) - Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms.
D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.