Related papers: Scalable Deep Reinforcement Learning for Ride-Hailing

Scalable Deep Reinforcement Learning for Ride-Hailing

URL: http://arxiv.org/abs/2009.14679v1
Date: Sun, 27 Sep 2020 20:07:12 GMT
Title: Scalable Deep Reinforcement Learning for Ride-Hailing
Authors: Jiekun Feng, Mark Gluzman, J. G. Dai
Abstract summary: Ride-hailing services such as Didi Chuxing, Lyft, and Uber arrange thousands of cars to meet ride requests throughout the day. We consider a Markov decision process (MDP) model of a ride-hailing service system, framing it as a reinforcement learning (RL) problem. We propose a special decomposition for the MDP actions by sequentially assigning tasks to the drivers.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ride-hailing services, such as Didi Chuxing, Lyft, and Uber, arrange thousands of cars to meet ride requests throughout the day. We consider a Markov decision process (MDP) model of a ride-hailing service system, framing it as a reinforcement learning (RL) problem. The simultaneous control of many agents (cars) presents a challenge for the MDP optimization because the action space grows exponentially with the number of cars. We propose a special decomposition for the MDP actions by sequentially assigning tasks to the drivers. The new actions structure resolves the scalability problem and enables the use of deep RL algorithms for control policy optimization. We demonstrate the benefit of our proposed decomposition with a numerical experiment based on real data from Didi Chuxing.

Related papers

TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning [61.33599727106222]
TeLL-Drive is a hybrid framework that integrates a Teacher LLM to guide an attention-based Student DRL policy. A self-attention mechanism then fuses these strategies with the DRL agent's exploration, accelerating policy convergence and boosting robustness.
arXiv Detail & Related papers (2025-02-03T14:22:03Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Implicit Sensing in Traffic Optimization: Advanced Deep Reinforcement Learning Techniques [4.042717292629285]
We present an integrated car-following and lane-changing decision-control system based on Deep Reinforcement Learning (DRL) We employ the well-known DQN algorithm to train the RL agent to make the appropriate decision accordingly. We evaluate the performance of the proposed model under two policies; epsilon-greedy policy and Boltzmann policy.
arXiv Detail & Related papers (2023-09-25T15:33:08Z)
Action and Trajectory Planning for Urban Autonomous Driving with Hierarchical Reinforcement Learning [1.3397650653650457]
We propose an action and trajectory planner using Hierarchical Reinforcement Learning (atHRL) method. We empirically verify the efficacy of atHRL through extensive experiments in complex urban driving scenarios.
arXiv Detail & Related papers (2023-06-28T07:11:02Z)
Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z)
Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow [73.1896399783641]
In membership/subscriber acquisition and retention, we sometimes need to recommend marketing content for multiple pages in sequence. We propose to formulate the problem as an MDP with Bandits where Bandits are employed to model the transition probability matrix. We observe the proposed MDP with Bandits algorithm outperforms Q-learning with $epsilon$-greedy and decreasing $epsilon$, independent Bandits, and interaction Bandits.
arXiv Detail & Related papers (2021-07-01T03:54:36Z)
A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching. We conduct large scale online A/B tests on DiDi's ride-dispatching platform. Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z)
Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement Learning [52.2663102239029]
We present a new practical framework based on deep reinforcement learning and decision-time planning for real-world vehicle on idle-hailing platforms. Our approach learns ride-based state-value function using a batch training algorithm with deep value. We benchmark our algorithm with baselines in a ride-hailing simulation environment to demonstrate its superiority in improving income efficiency.
arXiv Detail & Related papers (2021-03-08T05:34:05Z)
Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging [10.480121529429631]
Two broad classes of techniques have been proposed to solve motion planning problems in autonomous driving: Model Predictive Control (MPC) and Reinforcement Learning (RL) We first establish the strengths and weaknesses of state-of-the-art MPC and RL-based techniques through simulations. We subsequently present an algorithm which blends the model-free RL agent with the MPC solution and show that it provides better trade-offs between all metrics -- passenger comfort, efficiency, crash rate and robustness.
arXiv Detail & Related papers (2020-11-17T07:42:11Z)
Deep Surrogate Q-Learning for Autonomous Driving [17.30342128504405]
We propose Surrogate Q-learning for learning lane-change behavior for autonomous driving. We show that the architecture leads to a novel replay sampling technique we call Scene-centric Experience Replay. We also show that our methods enhance real-world applicability of RL systems by learning policies on the real highD dataset.
arXiv Detail & Related papers (2020-10-21T19:49:06Z)
Efficient Ridesharing Dispatch Using Multi-Agent Reinforcement Learning [0.0]
Ride-sharing services such as Uber and Lyft offer a service where passengers can order a car to pick them up. Traditional Reinforcement Learning (RL) based methods attempting to solve the ridesharing problem are unable to accurately model the complex environment in which taxis operate. We show that our model performs better than the IDQN baseline on a fixed grid size and is able to generalize well to smaller or larger grid sizes. Our algorithm is able to outperform IDQN baseline in the scenario where we have a variable number of passengers and cars in each episode.
arXiv Detail & Related papers (2020-06-18T23:37:53Z)
Reinforcement Learning Based Vehicle-cell Association Algorithm for Highly Mobile Millimeter Wave Communication [53.47785498477648]
This paper investigates the problem of vehicle-cell association in millimeter wave (mmWave) communication networks. We first formulate the user state (VU) problem as a discrete non-vehicle association optimization problem. The proposed solution achieves up to 15% gains in terms sum of user complexity and 20% reduction in VUE compared to several baseline designs.
arXiv Detail & Related papers (2020-01-22T08:51:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.