DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and
Delivery Route Prediction
- URL: http://arxiv.org/abs/2307.16246v1
- Date: Sun, 30 Jul 2023 14:50:31 GMT
- Title: DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and
Delivery Route Prediction
- Authors: Xiaowei Mao, Haomin Wen, Hengrui Zhang, Huaiyu Wan, Lixia Wu, Jianbin
Zheng, Haoyuan Hu, Youfang Lin
- Abstract summary: We present the first attempt to generalize Reinforcement Learning (RL) to the route prediction task, leading to a novel RL-based framework called DRL4Route.
DRL4Route can serve as a plug-and-play component to boost the existing deep learning models.
It follows the actor-critic architecture which is equipped with a Generalized Advantage Estimator.
- Score: 21.335721424944257
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pick-up and Delivery Route Prediction (PDRP), which aims to estimate the
future service route of a worker given his current task pool, has received
rising attention in recent years. Deep neural networks based on supervised
learning have emerged as the dominant model for the task because of their
powerful ability to capture workers' behavior patterns from massive historical
data. Though promising, they fail to introduce the non-differentiable test
criteria into the training process, leading to a mismatch in training and test
criteria. Which considerably trims down their performance when applied in
practical systems. To tackle the above issue, we present the first attempt to
generalize Reinforcement Learning (RL) to the route prediction task, leading to
a novel RL-based framework called DRL4Route. It combines the behavior-learning
abilities of previous deep learning models with the non-differentiable
objective optimization ability of reinforcement learning. DRL4Route can serve
as a plug-and-play component to boost the existing deep learning models. Based
on the framework, we further implement a model named DRL4Route-GAE for PDRP in
logistic service. It follows the actor-critic architecture which is equipped
with a Generalized Advantage Estimator that can balance the bias and variance
of the policy gradient estimates, thus achieving a more optimal policy.
Extensive offline experiments and the online deployment show that DRL4Route-GAE
improves Location Square Deviation (LSD) by 0.9%-2.7%, and Accuracy@3 (ACC@3)
by 2.4%-3.2% over existing methods on the real-world dataset.
Related papers
- S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [51.84977135926156]
We introduce S$2$R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference.
Our results demonstrate that Qwen2.5-math-7B achieves an accuracy improvement from 51.0% to 81.6%, outperforming models trained on an equivalent amount of long-CoT distilled data.
arXiv Detail & Related papers (2025-02-18T13:40:22Z) - End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation [34.070813293944944]
We propose UAD, a method for vision-based end-to-end autonomous driving (E2EAD)
Our motivation stems from the observation that current E2EAD models still mimic the modular architecture in typical driving stacks.
Our UAD achieves 38.7% relative improvements over UniAD on the average collision rate in nuScenes and surpasses VAD for 41.32 points on the driving score in CARLA's Town05 Long benchmark.
arXiv Detail & Related papers (2024-06-25T16:12:52Z) - Robust and Explainable Fine-Grained Visual Classification with Transfer Learning: A Dual-Carriageway Framework [0.799543372823325]
We present an automatic best-suit training solution searching framework, the Dual-Carriageway Framework (DCF)
We validated DCF's effectiveness through experiments with three convolutional neural networks (ResNet18, ResNet34 and Inception-v3)
Results showed fine-tuning pathways outperformed training-from-scratch ones by up to 2.13% and 1.23% on the pre-existing and new datasets, respectively.
arXiv Detail & Related papers (2024-05-09T15:41:10Z) - NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning
Disentangled Reasoning [101.56342075720588]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.
Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.
This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z) - Data-efficient Deep Reinforcement Learning for Vehicle Trajectory
Control [6.144517901919656]
Reinforcement learning (RL) promises to achieve control performance superior to classical approaches.
Standard RL approaches like soft-actor critic (SAC) require extensive amounts of training data to be collected.
We apply recently developed data-efficient deep RL methods to vehicle trajectory control.
arXiv Detail & Related papers (2023-11-30T09:38:59Z) - Pre-training on Synthetic Driving Data for Trajectory Prediction [61.520225216107306]
We propose a pipeline-level solution to mitigate the issue of data scarcity in trajectory forecasting.
We adopt HD map augmentation and trajectory synthesis for generating driving data, and then we learn representations by pre-training on them.
We conduct extensive experiments to demonstrate the effectiveness of our data expansion and pre-training strategies.
arXiv Detail & Related papers (2023-09-18T19:49:22Z) - Avoidance Navigation Based on Offline Pre-Training Reinforcement
Learning [0.0]
This paper presents a Pre-Training Deep Reinforcement Learning(DRL) for avoidance navigation without map for mobile robots.
The efficient offline training strategy is proposed to speed up the inefficient random explorations in early stage.
It was demonstrated that our DRL model have universal general capacity in different environment.
arXiv Detail & Related papers (2023-08-03T06:19:46Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm
Deployed in Ridehailing Marketplace [12.298997392937876]
This study proposes a real-time dispatching algorithm based on reinforcement learning.
It is deployed online in multiple cities under DiDi's operation for A/B testing and is launched in one of the major international markets.
The deployed algorithm shows over 1.3% improvement in total driver income from A/B testing.
arXiv Detail & Related papers (2022-02-10T16:07:17Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.