Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform
- URL: http://arxiv.org/abs/2501.05808v1
- Date: Fri, 10 Jan 2025 09:15:40 GMT
- Title: Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform
- Authors: Jingyi Cheng, Shadi Sharif Azadeh,
- Abstract summary: This study sets out to solve the real-time order dispatching and idle courier steering problems for a meal delivery platform.
We propose a reinforcement learning (RL)-based strategic dual-control framework.
We find the delivery efficiency and fairness of workload distribution among couriers have been improved.
- Score: 0.0
- License:
- Abstract: To achieve high service quality and profitability, meal delivery platforms like Uber Eats and Grubhub must strategically operate their fleets to ensure timely deliveries for current orders while mitigating the consequential impacts of suboptimal decisions that leads to courier understaffing in the future. This study set out to solve the real-time order dispatching and idle courier steering problems for a meal delivery platform by proposing a reinforcement learning (RL)-based strategic dual-control framework. To address the inherent sequential nature of these problems, we model both order dispatching and courier steering as Markov Decision Processes. Trained via a deep reinforcement learning (DRL) framework, we obtain strategic policies by leveraging the explicitly predicted demands as part of the inputs. In our dual-control framework, the dispatching and steering policies are iteratively trained in an integrated manner. These forward-looking policies can be executed in real-time and provide decisions while jointly considering the impacts on local and network levels. To enhance dispatching fairness, we propose convolutional deep Q networks to construct fair courier embeddings. To simultaneously rebalance the supply and demand within the service network, we propose to utilize mean-field approximated supply-demand knowledge to reallocate idle couriers at the local level. Utilizing the policies generated by the RL-based strategic dual-control framework, we find the delivery efficiency and fairness of workload distribution among couriers have been improved, and under-supplied conditions have been alleviated within the service network. Our study sheds light on designing an RL-based framework to enable forward-looking real-time operations for meal delivery platforms and other on-demand services.
Related papers
- Self-Regulation and Requesting Interventions [63.5863047447313]
We propose an offline framework that trains a "helper" policy to request interventions.
We score optimal intervention timing with PRMs and train the helper model on these labeled trajectories.
This offline approach significantly reduces costly intervention calls during training.
arXiv Detail & Related papers (2025-02-07T00:06:17Z) - Learning to Sail Dynamic Networks: The MARLIN Reinforcement Learning
Framework for Congestion Control in Tactical Environments [53.08686495706487]
This paper proposes an RL framework that leverages an accurate and parallelizable emulation environment to reenact the conditions of a tactical network.
We evaluate our RL learning framework by training a MARLIN agent in conditions replicating a bottleneck link transition between a Satellite Communication (SATCOM) and an UHF Wide Band (UHF) radio link.
arXiv Detail & Related papers (2023-06-27T16:15:15Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand
Distribution [48.27759561064771]
We consider the two-echelon supply chain model introduced in [Cachon and Zipkin, 1999] under two different settings.
We design algorithms that achieve favorable guarantees for both regret and convergence to the optimal inventory decision in both settings.
Our algorithms are based on Online Gradient Descent and Online Newton Step, together with several new ingredients specifically designed for our problem.
arXiv Detail & Related papers (2022-10-23T08:45:39Z) - Off-line approximate dynamic programming for the vehicle routing problem
with stochastic customers and demands via decentralized decision-making [0.0]
This paper studies a variant of the vehicle routing problem (VRP) where both customer locations and demands are uncertain.
The objective is to maximize the served demands while fulfilling vehicle capacities and time restrictions.
We develop a Q-learning algorithm featuring state-of-the-art acceleration techniques such as Replay Memory and Double Q Network.
arXiv Detail & Related papers (2021-09-21T14:28:09Z) - A Deep Reinforcement Learning Approach for Constrained Online Logistics
Route Assignment [4.367543599338385]
It is crucial for the logistics industry on how to assign a candidate logistics route for each shipping parcel properly.
This online route-assignment problem can be viewed as a constrained online decision-making problem.
We develop a model-free DRL approach named PPO-RA, in which Proximal Policy Optimization (PPO) is improved with dedicated techniques to address the challenges for route assignment (RA)
arXiv Detail & Related papers (2021-09-08T07:27:39Z) - A Modular and Transferable Reinforcement Learning Framework for the
Fleet Rebalancing Problem [2.299872239734834]
We propose a modular framework for fleet rebalancing based on model-free reinforcement learning (RL)
We formulate RL state and action spaces as distributions over a grid of the operating area, making the framework scalable.
Numerical experiments, using real-world trip and network data, demonstrate that this approach has several distinct advantages over baseline methods.
arXiv Detail & Related papers (2021-05-27T16:32:28Z) - MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch
Optimization for Deployment Constrained Reinforcement Learning [108.79676336281211]
Continuous deployment of new policies for data collection and online learning is either cost ineffective or impractical.
We propose a new algorithmic learning framework called Model-based Uncertainty regularized and Sample Efficient Batch Optimization.
Our framework discovers novel and high quality samples for each deployment to enable efficient data collection.
arXiv Detail & Related papers (2021-02-23T01:30:55Z) - Vehicular Cooperative Perception Through Action Branching and Federated
Reinforcement Learning [101.64598586454571]
A novel framework is proposed to allow reinforcement learning-based vehicular association, resource block (RB) allocation, and content selection of cooperative perception messages (CPMs)
A federated RL approach is introduced in order to speed up the training process across vehicles.
Results show that federated RL improves the training process, where better policies can be achieved within the same amount of time compared to the non-federated approach.
arXiv Detail & Related papers (2020-12-07T02:09:15Z) - Deep Reinforcement Learning for Crowdsourced Urban Delivery: System
States Characterization, Heuristics-guided Action Choice, and
Rule-Interposing Integration [0.8099700053397277]
This paper investigates the problem of assigning shipping requests to ad hoc couriers in the context of crowdsourced urban delivery.
We propose a new deep reinforcement learning (DRL)-based approach to tackling this assignment problem.
arXiv Detail & Related papers (2020-11-29T19:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.