An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing
- URL: http://arxiv.org/abs/2408.10479v1
- Date: Tue, 20 Aug 2024 01:30:53 GMT
- Title: An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing
- Authors: Xinlang Yue, Yiran Liu, Fangzhou Shi, Sihong Luo, Chen Zhong, Min Lu, Zhe Xu,
- Abstract summary: We propose an end-to-end reinforcement learning based order-dispatching approach in Didi.
We employ a two-layer Decision Process framework to model this problem, and present underlineDeep underlineDouble underlineScalable underlineNetwork (DSN2), an encoder-decoder structure network to generate order assignments.
By leveraging contextual dynamics, our approach can adapt to the behavioral patterns for better performance.
- Score: 8.892147201091726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Assigning orders to drivers under localized spatiotemporal context (micro-view order-dispatching) is a major task in Didi, as it influences ride-hailing service experience. Existing industrial solutions mainly follow a two-stage pattern that incorporate heuristic or learning-based algorithms with naive combinatorial methods, tackling the uncertainty of both sides' behaviors, including emerging timings, spatial relationships, and travel duration, etc. In this paper, we propose a one-stage end-to-end reinforcement learning based order-dispatching approach that solves behavior prediction and combinatorial optimization uniformly in a sequential decision-making manner. Specifically, we employ a two-layer Markov Decision Process framework to model this problem, and present \underline{D}eep \underline{D}ouble \underline{S}calable \underline{N}etwork (D2SN), an encoder-decoder structure network to generate order-driver assignments directly and stop assignments accordingly. Besides, by leveraging contextual dynamics, our approach can adapt to the behavioral patterns for better performance. Extensive experiments on Didi's real-world benchmarks justify that the proposed approach significantly outperforms competitive baselines in optimizing matching efficiency and user experience tasks. In addition, we evaluate the deployment outline and discuss the gains and experiences obtained during the deployment tests from the view of large-scale engineering implementation.
Related papers
- Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation [34.55224347308013]
Traditional supervised fine-tuning (SFT) strategies for sequence-to-sequence tasks often train models to directly generate the target output.
We introduce a task-agnostic framework that enables models to generate intermediate "upwarm" sequences.
We show that our approach outperforms traditional SFT methods, and offers a scalable and flexible solution for sequence-to-sequence tasks.
arXiv Detail & Related papers (2025-02-17T20:23:42Z) - Optimal Task Order for Continual Learning of Multiple Tasks [3.591122855617648]
Continual learning of multiple tasks remains a major challenge for neural networks.
Here, we investigate how task order influences continual learning and propose a strategy for optimizing it.
Our work thus presents a generalizable framework for task-order optimization in task-incremental continual learning.
arXiv Detail & Related papers (2025-02-05T16:43:58Z) - A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.
With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)
Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.
High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z) - Causality-Aware Transformer Networks for Robotic Navigation [13.719643934968367]
Current research in Visual Navigation reveals opportunities for improvement.
Direct adoption of RNNs and Transformers often overlooks the specific differences between Embodied AI and traditional sequential data modelling.
We propose Causality-Aware Transformer (CAT) Networks for Navigation, featuring a Causal Understanding Module.
arXiv Detail & Related papers (2024-09-04T12:53:26Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Hierarchical Neural Constructive Solver for Real-world TSP Scenarios [27.986011761759567]
We introduce realistic Traveling Salesman Problem (TSP) scenarios relevant to industrial settings.
Our hierarchical approach yields superior performance compared to both classical and recent transformer models.
arXiv Detail & Related papers (2024-08-07T06:44:47Z) - An Efficient Learning-based Solver Comparable to Metaheuristics for the
Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics.
First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process.
Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - Improving Online Performance Prediction for Semantic Segmentation [29.726236358091295]
We address the task of observing the performance of a semantic segmentation deep neural network (DNN) during online operation.
Many high-level decisions rely on such DNNs, which are usually evaluated offline, while their performance in online operation remains unknown.
We propose an improved online performance prediction scheme, building on a recently proposed concept of predicting the primary semantic segmentation task's performance.
arXiv Detail & Related papers (2021-04-12T07:44:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.