An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing
- URL: http://arxiv.org/abs/2408.10479v1
- Date: Tue, 20 Aug 2024 01:30:53 GMT
- Title: An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing
- Authors: Xinlang Yue, Yiran Liu, Fangzhou Shi, Sihong Luo, Chen Zhong, Min Lu, Zhe Xu,
- Abstract summary: We propose an end-to-end reinforcement learning based order-dispatching approach in Didi.
We employ a two-layer Decision Process framework to model this problem, and present underlineDeep underlineDouble underlineScalable underlineNetwork (DSN2), an encoder-decoder structure network to generate order assignments.
By leveraging contextual dynamics, our approach can adapt to the behavioral patterns for better performance.
- Score: 8.892147201091726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Assigning orders to drivers under localized spatiotemporal context (micro-view order-dispatching) is a major task in Didi, as it influences ride-hailing service experience. Existing industrial solutions mainly follow a two-stage pattern that incorporate heuristic or learning-based algorithms with naive combinatorial methods, tackling the uncertainty of both sides' behaviors, including emerging timings, spatial relationships, and travel duration, etc. In this paper, we propose a one-stage end-to-end reinforcement learning based order-dispatching approach that solves behavior prediction and combinatorial optimization uniformly in a sequential decision-making manner. Specifically, we employ a two-layer Markov Decision Process framework to model this problem, and present \underline{D}eep \underline{D}ouble \underline{S}calable \underline{N}etwork (D2SN), an encoder-decoder structure network to generate order-driver assignments directly and stop assignments accordingly. Besides, by leveraging contextual dynamics, our approach can adapt to the behavioral patterns for better performance. Extensive experiments on Didi's real-world benchmarks justify that the proposed approach significantly outperforms competitive baselines in optimizing matching efficiency and user experience tasks. In addition, we evaluate the deployment outline and discuss the gains and experiences obtained during the deployment tests from the view of large-scale engineering implementation.
Related papers
- Causality-Aware Transformer Networks for Robotic Navigation [13.719643934968367]
Current research in Visual Navigation reveals opportunities for improvement.
Direct adoption of RNNs and Transformers often overlooks the specific differences between Embodied AI and traditional sequential data modelling.
We propose Causality-Aware Transformer (CAT) Networks for Navigation, featuring a Causal Understanding Module.
arXiv Detail & Related papers (2024-09-04T12:53:26Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Hierarchical Neural Constructive Solver for Real-world TSP Scenarios [27.986011761759567]
We introduce realistic Traveling Salesman Problem (TSP) scenarios relevant to industrial settings.
Our hierarchical approach yields superior performance compared to both classical and recent transformer models.
arXiv Detail & Related papers (2024-08-07T06:44:47Z) - Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation [7.005068872406135]
Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems.
These approaches frequently involve complex training pipelines and a substantial computational burden.
We propose a PrevMatch framework that effectively mitigates the limitations by maximizing the utilization of the temporal knowledge obtained during the training process.
arXiv Detail & Related papers (2024-05-31T03:54:59Z) - An Efficient Learning-based Solver Comparable to Metaheuristics for the
Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics.
First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process.
Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - Contrastive Self-supervised Sequential Recommendation with Robust
Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data.
Old and new issues remain, including data-sparsity and noisy data.
We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z) - Improving Online Performance Prediction for Semantic Segmentation [29.726236358091295]
We address the task of observing the performance of a semantic segmentation deep neural network (DNN) during online operation.
Many high-level decisions rely on such DNNs, which are usually evaluated offline, while their performance in online operation remains unknown.
We propose an improved online performance prediction scheme, building on a recently proposed concept of predicting the primary semantic segmentation task's performance.
arXiv Detail & Related papers (2021-04-12T07:44:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.