Related papers: H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi Dispatch

H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi Dispatch

URL: http://arxiv.org/abs/2105.02138v1
Date: Wed, 5 May 2021 15:42:31 GMT
Title: H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi Dispatch
Authors: Benjamin Rivi\`ere and Soon-Jo Chung
Abstract summary: H-TD2 is a model-free, adaptive decision-making algorithm to coordinate a large fleet of automated taxis in a dynamic urban environment. We derive a regret bound and design the trigger condition between the two behaviors to explicitly control the trade-off between computational complexity and the individual taxi policy's bounded sub-optimality. Unlike recent reinforcement learning dispatch methods, this policy estimation is adaptive and robust to out-of-training domain events.
Score: 9.35511513240868
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present H-TD2: Hybrid Temporal Difference Learning for Taxi Dispatch, a model-free, adaptive decision-making algorithm to coordinate a large fleet of automated taxis in a dynamic urban environment to minimize expected customer waiting times. Our scalable algorithm exploits the natural transportation network company topology by switching between two behaviors: distributed temporal-difference learning computed locally at each taxi and infrequent centralized Bellman updates computed at the dispatch center. We derive a regret bound and design the trigger condition between the two behaviors to explicitly control the trade-off between computational complexity and the individual taxi policy's bounded sub-optimality; this advances the state of the art by enabling distributed operation with bounded-suboptimality. Additionally, unlike recent reinforcement learning dispatch methods, this policy estimation is adaptive and robust to out-of-training domain events. This result is enabled by a two-step modelling approach: the policy is learned on an agent-agnostic, cell-based Markov Decision Process and individual taxis are coordinated using the learned policy in a distributed game-theoretic task assignment. We validate our algorithm against a receding horizon control baseline in a Gridworld environment with a simulated customer dataset, where the proposed solution decreases average customer waiting time by 50% over a wide range of parameters. We also validate in a Chicago city environment with real customer requests from the Chicago taxi public dataset where the proposed solution decreases average customer waiting time by 26% over irregular customer distributions during a 2016 Major League Baseball World Series game.

Related papers

Approximate Multiagent Reinforcement Learning for On-Demand Urban Mobility Problem on a Large Map (extended version) [9.32626005183482]
We study the autonomous multiagent taxi routing problem for a large urban environment. Recent theory has shown that a rollout algorithm with a stable base policy produces a near-optimal stable policy. We propose an approximate multiagent rollout-based two phase algorithm that reduces computational costs, while still achieving a stable near-optimal policy.
arXiv Detail & Related papers (2023-11-02T18:33:32Z)
Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach [0.0]
We model the problem as a semi-Markov decision process, which allows us to treat time as continuous. We argue that an event-based approach substantially reduces the complexity of the decision space and overcomes other limitations of discrete-time models. Results show that our policies exhibit better average waiting times, cancellation rates and total service times, with reduction of up to 50% relative to the other tested policies.
arXiv Detail & Related papers (2023-07-13T16:29:25Z)
Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints. We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z)
Off-line approximate dynamic programming for the vehicle routing problem with stochastic customers and demands via decentralized decision-making [0.0]
This paper studies a variant of the vehicle routing problem (VRP) where both customer locations and demands are uncertain. The objective is to maximize the served demands while fulfilling vehicle capacities and time restrictions. We develop a Q-learning algorithm featuring state-of-the-art acceleration techniques such as Replay Memory and Double Q Network.
arXiv Detail & Related papers (2021-09-21T14:28:09Z)
A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching. We conduct large scale online A/B tests on DiDi's ride-dispatching platform. Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z)
AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection [34.77250498401055]
This paper introduces an adaptive model-free deep reinforcement approach that can recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling. In addition to the adaptation logic in dispatching, this paper also proposes a dynamic, demand-aware vehicle-passenger matching and route planning framework.
arXiv Detail & Related papers (2021-04-01T02:14:01Z)
Equilibrium Inverse Reinforcement Learning for Ride-hailing Vehicle Network [1.599072005190786]
We formulate the problem of passenger-vehicle matching in a sparsely connected graph. We propose an algorithm to derive an equilibrium policy in a multi-agent environment.
arXiv Detail & Related papers (2021-02-13T03:18:44Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Multi-Armed Bandit Based Client Scheduling for Federated Learning [91.91224642616882]
federated learning (FL) features ubiquitous properties such as reduction of communication overhead and preserving data privacy. In each communication round of FL, the clients update local models based on their own data and upload their local updates via wireless channels. This work provides a multi-armed bandit-based framework for online client scheduling (CS) in FL without knowing wireless channel state information and statistical characteristics of clients.
arXiv Detail & Related papers (2020-07-05T12:32:32Z)
Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
Reinforcement Learning Based Vehicle-cell Association Algorithm for Highly Mobile Millimeter Wave Communication [53.47785498477648]
This paper investigates the problem of vehicle-cell association in millimeter wave (mmWave) communication networks. We first formulate the user state (VU) problem as a discrete non-vehicle association optimization problem. The proposed solution achieves up to 15% gains in terms sum of user complexity and 20% reduction in VUE compared to several baseline designs.
arXiv Detail & Related papers (2020-01-22T08:51:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.