A Deep Value-network Based Approach for Multi-Driver Order Dispatching
- URL: http://arxiv.org/abs/2106.04493v1
- Date: Tue, 8 Jun 2021 16:27:04 GMT
- Title: A Deep Value-network Based Approach for Multi-Driver Order Dispatching
- Authors: Xiaocheng Tang, Zhiwei Qin, Fan Zhang, Zhaodong Wang, Zhe Xu, Yintai
Ma, Hongtu Zhu, Jieping Ye
- Abstract summary: We propose a deep reinforcement learning based solution for order dispatching.
We conduct large scale online A/B tests on DiDi's ride-dispatching platform.
Results show that CVNet consistently outperforms other recently proposed dispatching methods.
- Score: 55.36656442934531
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent works on ride-sharing order dispatching have highlighted the
importance of taking into account both the spatial and temporal dynamics in the
dispatching process for improving the transportation system efficiency. At the
same time, deep reinforcement learning has advanced to the point where it
achieves superhuman performance in a number of fields. In this work, we propose
a deep reinforcement learning based solution for order dispatching and we
conduct large scale online A/B tests on DiDi's ride-dispatching platform to
show that the proposed method achieves significant improvement on both total
driver income and user experience related metrics. In particular, we model the
ride dispatching problem as a Semi Markov Decision Process to account for the
temporal aspect of the dispatching actions. To improve the stability of the
value iteration with nonlinear function approximators like neural networks, we
propose Cerebellar Value Networks (CVNet) with a novel distributed state
representation layer. We further derive a regularized policy evaluation scheme
for CVNet that penalizes large Lipschitz constant of the value network for
additional robustness against adversarial perturbation and noises. Finally, we
adapt various transfer learning methods to CVNet for increased learning
adaptability and efficiency across multiple cities. We conduct extensive
offline simulations based on real dispatching data as well as online AB tests
through the DiDi's platform. Results show that CVNet consistently outperforms
other recently proposed dispatching methods. We finally show that the
performance can be further improved through the efficient use of transfer
learning.
Related papers
- Adaptive Anomaly Detection in Network Flows with Low-Rank Tensor Decompositions and Deep Unrolling [9.20186865054847]
Anomaly detection (AD) is increasingly recognized as a key component for ensuring the resilience of future communication systems.
This work considers AD in network flows using incomplete measurements.
We propose a novel block-successive convex approximation algorithm based on a regularized model-fitting objective.
Inspired by Bayesian approaches, we extend the model architecture to perform online adaptation to per-flow and per-time-step statistics.
arXiv Detail & Related papers (2024-09-17T19:59:57Z) - Rapid Network Adaptation: Learning to Adapt Neural Networks Using
Test-Time Feedback [12.946419909506883]
We create a closed-loop system that makes use of a test-time feedback signal to adapt a network on the fly.
We show that this loop can be effectively implemented using a learning-based function, which realizes an amortized for the network.
This leads to an adaptation method, named Rapid Network Adaptation (RNA), that is notably more flexible and orders of magnitude faster than the baselines.
arXiv Detail & Related papers (2023-09-27T16:20:39Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning [19.978542231976636]
This paper proposes a novel method to reduce the parameters and FLOPs for computational efficiency in deep learning models.
We introduce accuracy and efficiency coefficients to control the trade-off between the accuracy of the network and its computing efficiency.
arXiv Detail & Related papers (2023-01-26T12:32:01Z) - Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm
Deployed in Ridehailing Marketplace [12.298997392937876]
This study proposes a real-time dispatching algorithm based on reinforcement learning.
It is deployed online in multiple cities under DiDi's operation for A/B testing and is launched in one of the major international markets.
The deployed algorithm shows over 1.3% improvement in total driver income from A/B testing.
arXiv Detail & Related papers (2022-02-10T16:07:17Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement
Learning [52.2663102239029]
We present a new practical framework based on deep reinforcement learning and decision-time planning for real-world vehicle on idle-hailing platforms.
Our approach learns ride-based state-value function using a batch training algorithm with deep value.
We benchmark our algorithm with baselines in a ride-hailing simulation environment to demonstrate its superiority in improving income efficiency.
arXiv Detail & Related papers (2021-03-08T05:34:05Z) - ES-Net: An Efficient Stereo Matching Network [4.8986598953553555]
Existing stereo matching networks typically use slow and computationally expensive 3D convolutions to improve the performance.
We propose the Efficient Stereo Network (ESNet), which achieves high performance and efficient inference at the same time.
arXiv Detail & Related papers (2021-03-05T20:11:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.