Related papers: Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach

Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach

URL: http://arxiv.org/abs/2307.07508v1
Date: Thu, 13 Jul 2023 16:29:25 GMT
Title: Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach
Authors: Edyvalberty Alenquer Cordeiro, Anselmo Ramalho Pitombeira-Neto
Abstract summary: We model the problem as a semi-Markov decision process, which allows us to treat time as continuous. We argue that an event-based approach substantially reduces the complexity of the decision space and overcomes other limitations of discrete-time models. Results show that our policies exhibit better average waiting times, cancellation rates and total service times, with reduction of up to 50% relative to the other tested policies.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The dynamic vehicle dispatching problem corresponds to deciding which vehicles to assign to requests that arise stochastically over time and space. It emerges in diverse areas, such as in the assignment of trucks to loads to be transported; in emergency systems; and in ride-hailing services. In this paper, we model the problem as a semi-Markov decision process, which allows us to treat time as continuous. In this setting, decision epochs coincide with discrete events whose time intervals are random. We argue that an event-based approach substantially reduces the combinatorial complexity of the decision space and overcomes other limitations of discrete-time models often proposed in the literature. In order to test our approach, we develop a new discrete-event simulator and use double deep q-learning to train our decision agents. Numerical experiments are carried out in realistic scenarios using data from New York City. We compare the policies obtained through our approach with heuristic policies often used in practice. Results show that our policies exhibit better average waiting times, cancellation rates and total service times, with reduction in average waiting times of up to 50% relative to the other tested heuristic policies.

Related papers

Contextual Stochastic Vehicle Routing with Time Windows [47.91283991228738]
We study the vehicle routing problem with time windows (VRPTW) and travel times. We introduce the contextual VRPTW, which minimizes the total transportation cost and expected late arrival penalties conditioned on the observed features. We present novel data-driven prescriptive models that use historical data to provide an approximate solution to the problem.
arXiv Detail & Related papers (2024-02-10T14:56:36Z)
A Reinforcement Learning Approach for Dynamic Rebalancing in Bike-Sharing System [11.237099288412558]
Bike-Sharing Systems provide eco-friendly urban mobility, contributing to the alleviation of traffic congestion and healthier lifestyles. Devising effective rebalancing strategies using vehicles to redistribute bikes among stations is therefore of uttermost importance for operators. This paper introduces atemporal reinforcement learning algorithm for the dynamic rebalancing problem with multiple vehicles.
arXiv Detail & Related papers (2024-02-05T23:46:42Z)
AI planning in the imagination: High-level planning on learned abstract search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training. We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z)
Exploring the Multi-modal Demand Dynamics During Transport System Disruptions [0.47267770920095536]
This study takes a data-driven approach to explore multi-modal demand dynamics under disruptions. We first develop a methodology to automatically detect anomalous instances through historical hourly travel demand data. Then we apply clustering to these anomalous hours to distinguish various forms of multi-modal demand dynamics occurring during disruptions.
arXiv Detail & Related papers (2023-07-03T09:15:28Z)
Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare. Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions. We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z)
H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi Dispatch [9.35511513240868]
H-TD2 is a model-free, adaptive decision-making algorithm to coordinate a large fleet of automated taxis in a dynamic urban environment. We derive a regret bound and design the trigger condition between the two behaviors to explicitly control the trade-off between computational complexity and the individual taxi policy's bounded sub-optimality. Unlike recent reinforcement learning dispatch methods, this policy estimation is adaptive and robust to out-of-training domain events.
arXiv Detail & Related papers (2021-05-05T15:42:31Z)
Congestion-aware Multi-agent Trajectory Prediction for Collision Avoidance [110.63037190641414]
We propose to learn congestion patterns explicitly and devise a novel "Sense--Learn--Reason--Predict" framework. By decomposing the learning phases into two stages, a "student" can learn contextual cues from a "teacher" while generating collision-free trajectories. In experiments, we demonstrate that the proposed model is able to generate collision-free trajectory predictions in a synthetic dataset.
arXiv Detail & Related papers (2021-03-26T02:42:33Z)
Equilibrium Inverse Reinforcement Learning for Ride-hailing Vehicle Network [1.599072005190786]
We formulate the problem of passenger-vehicle matching in a sparsely connected graph. We propose an algorithm to derive an equilibrium policy in a multi-agent environment.
arXiv Detail & Related papers (2021-02-13T03:18:44Z)
SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction [72.37440317774556]
We propose advances that address two key challenges in future trajectory prediction. multimodality in both training data and predictions and constant time inference regardless of number of agents.
arXiv Detail & Related papers (2020-07-26T08:17:10Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.