Related papers: An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing

An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing

URL: http://arxiv.org/abs/2408.10479v1
Date: Tue, 20 Aug 2024 01:30:53 GMT
Title: An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing
Authors: Xinlang Yue, Yiran Liu, Fangzhou Shi, Sihong Luo, Chen Zhong, Min Lu, Zhe Xu,
Abstract summary: We propose an end-to-end reinforcement learning based order-dispatching approach in Didi. We employ a two-layer Decision Process framework to model this problem, and present underlineDeep underlineDouble underlineScalable underlineNetwork (DSN2), an encoder-decoder structure network to generate order assignments. By leveraging contextual dynamics, our approach can adapt to the behavioral patterns for better performance.
Score: 8.892147201091726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Assigning orders to drivers under localized spatiotemporal context (micro-view order-dispatching) is a major task in Didi, as it influences ride-hailing service experience. Existing industrial solutions mainly follow a two-stage pattern that incorporate heuristic or learning-based algorithms with naive combinatorial methods, tackling the uncertainty of both sides' behaviors, including emerging timings, spatial relationships, and travel duration, etc. In this paper, we propose a one-stage end-to-end reinforcement learning based order-dispatching approach that solves behavior prediction and combinatorial optimization uniformly in a sequential decision-making manner. Specifically, we employ a two-layer Markov Decision Process framework to model this problem, and present \underline{D}eep \underline{D}ouble \underline{S}calable \underline{N}etwork (D2SN), an encoder-decoder structure network to generate order-driver assignments directly and stop assignments accordingly. Besides, by leveraging contextual dynamics, our approach can adapt to the behavioral patterns for better performance. Extensive experiments on Didi's real-world benchmarks justify that the proposed approach significantly outperforms competitive baselines in optimizing matching efficiency and user experience tasks. In addition, we evaluate the deployment outline and discuss the gains and experiences obtained during the deployment tests from the view of large-scale engineering implementation.

Related papers

Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation [34.55224347308013]
Traditional supervised fine-tuning (SFT) strategies for sequence-to-sequence tasks often train models to directly generate the target output. We introduce a task-agnostic framework that enables models to generate intermediate "upwarm" sequences. We show that our approach outperforms traditional SFT methods, and offers a scalable and flexible solution for sequence-to-sequence tasks.
arXiv Detail & Related papers (2025-02-17T20:23:42Z)
Optimal Task Order for Continual Learning of Multiple Tasks [3.591122855617648]
Continual learning of multiple tasks remains a major challenge for neural networks. Here, we investigate how task order influences continual learning and propose a strategy for optimizing it. Our work thus presents a generalizable framework for task-order optimization in task-incremental continual learning.
arXiv Detail & Related papers (2025-02-05T16:43:58Z)
A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts. With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS) Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements. High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z)
Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models. A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes. We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z)
Causality-Aware Transformer Networks for Robotic Navigation [13.719643934968367]
Current research in Visual Navigation reveals opportunities for improvement. Direct adoption of RNNs and Transformers often overlooks the specific differences between Embodied AI and traditional sequential data modelling. We propose Causality-Aware Transformer (CAT) Networks for Navigation, featuring a Causal Understanding Module.
arXiv Detail & Related papers (2024-09-04T12:53:26Z)
Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation. In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales. Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z)
Hierarchical Neural Constructive Solver for Real-world TSP Scenarios [27.986011761759567]
We introduce realistic Traveling Salesman Problem (TSP) scenarios relevant to industrial settings. Our hierarchical approach yields superior performance compared to both classical and recent transformer models.
arXiv Detail & Related papers (2024-08-07T06:44:47Z)
Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation [7.005068872406135]
Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. These approaches frequently involve complex training pipelines and a substantial computational burden. We propose a PrevMatch framework that effectively mitigates the limitations by maximizing the utilization of the temporal knowledge obtained during the training process.
arXiv Detail & Related papers (2024-05-31T03:54:59Z)
An Efficient Learning-based Solver Comparable to Metaheuristics for the Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics. First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process. Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z)
Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice. HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics. Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior: From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks. We present a generalization bound for meta-learning, which was first derived by Rothfuss et al. We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z)
Contrastive Self-supervised Sequential Recommendation with Robust Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data. Old and new issues remain, including data-sparsity and noisy data. We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z)
Improving Online Performance Prediction for Semantic Segmentation [29.726236358091295]
We address the task of observing the performance of a semantic segmentation deep neural network (DNN) during online operation. Many high-level decisions rely on such DNNs, which are usually evaluated offline, while their performance in online operation remains unknown. We propose an improved online performance prediction scheme, building on a recently proposed concept of predicting the primary semantic segmentation task's performance.
arXiv Detail & Related papers (2021-04-12T07:44:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.