Equilibrium Inverse Reinforcement Learning for Ride-hailing Vehicle
Network
- URL: http://arxiv.org/abs/2102.06854v1
- Date: Sat, 13 Feb 2021 03:18:44 GMT
- Title: Equilibrium Inverse Reinforcement Learning for Ride-hailing Vehicle
Network
- Authors: Takuma Oda
- Abstract summary: We formulate the problem of passenger-vehicle matching in a sparsely connected graph.
We propose an algorithm to derive an equilibrium policy in a multi-agent environment.
- Score: 1.599072005190786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ubiquitous mobile computing have enabled ride-hailing services to collect
vast amounts of behavioral data of riders and drivers and optimize supply and
demand matching in real time. While these mobility service providers have some
degree of control over the market by assigning vehicles to requests, they need
to deal with the uncertainty arising from self-interested driver behavior since
workers are usually free to drive when they are not assigned tasks. In this
work, we formulate the problem of passenger-vehicle matching in a sparsely
connected graph and proposed an algorithm to derive an equilibrium policy in a
multi-agent environment. Our framework combines value iteration methods to
estimate the optimal policy given expected state visitation and policy
propagation to compute multi-agent state visitation frequencies. Furthermore,
we developed a method to learn the driver's reward function transferable to an
environment with significantly different dynamics from training data. We
evaluated the robustness to changes in spatio-temporal supply-demand
distributions and deterioration in data quality using a real-world taxi
trajectory dataset; our approach significantly outperforms several baselines in
terms of imitation accuracy. The computational time required to obtain an
equilibrium policy shared by all vehicles does not depend on the number of
agents, and even on the scale of real-world services, it takes only a few
seconds on a single CPU.
Related papers
- A Reinforcement Learning Approach for Dynamic Rebalancing in
Bike-Sharing System [11.237099288412558]
Bike-Sharing Systems provide eco-friendly urban mobility, contributing to the alleviation of traffic congestion and healthier lifestyles.
Devising effective rebalancing strategies using vehicles to redistribute bikes among stations is therefore of uttermost importance for operators.
This paper introduces atemporal reinforcement learning algorithm for the dynamic rebalancing problem with multiple vehicles.
arXiv Detail & Related papers (2024-02-05T23:46:42Z) - Coalitional Bargaining via Reinforcement Learning: An Application to
Collaborative Vehicle Routing [49.00137468773683]
Collaborative Vehicle Routing is where delivery companies cooperate by sharing their delivery information and performing delivery requests on behalf of each other.
This achieves economies of scale and thus reduces cost, greenhouse gas emissions, and road congestion.
But which company should partner with whom, and how much should each company be compensated?
Traditional game theoretic solution concepts, such as the Shapley value or nucleolus, are difficult to calculate for the real-world problem of Collaborative Vehicle Routing.
arXiv Detail & Related papers (2023-10-26T15:04:23Z) - Deep reinforcement learning for the dynamic vehicle dispatching problem:
An event-based approach [0.0]
We model the problem as a semi-Markov decision process, which allows us to treat time as continuous.
We argue that an event-based approach substantially reduces the complexity of the decision space and overcomes other limitations of discrete-time models.
Results show that our policies exhibit better average waiting times, cancellation rates and total service times, with reduction of up to 50% relative to the other tested policies.
arXiv Detail & Related papers (2023-07-13T16:29:25Z) - Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning [48.667697255912614]
Mean-field reinforcement learning addresses the policy of a representative agent interacting with the infinite population of identical agents.
We propose Safe-M$3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions.
Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
arXiv Detail & Related papers (2023-06-29T15:57:07Z) - Embedding Synthetic Off-Policy Experience for Autonomous Driving via
Zero-Shot Curricula [48.58973705935691]
We show that an agent trained using only a 10% subset of the data performs just as well as an agent trained on the entire dataset.
We then demonstrate that this difficulty score can be used in a zero-shot transfer to generate curricula for an imitation-learning based planning agent.
arXiv Detail & Related papers (2022-12-02T18:57:21Z) - Scalable Vehicle Re-Identification via Self-Supervision [66.2562538902156]
Vehicle Re-Identification is one of the key elements in city-scale vehicle analytics systems.
Many state-of-the-art solutions for vehicle re-id mostly focus on improving the accuracy on existing re-id benchmarks and often ignore computational complexity.
We propose a simple yet effective hybrid solution empowered by self-supervised training which only uses a single network during inference time.
arXiv Detail & Related papers (2022-05-16T12:14:42Z) - A Queueing-Theoretic Framework for Vehicle Dispatching in Dynamic
Car-Hailing [technical report] [36.31694973019143]
We consider an important dynamic car-hailing problem, namely textitmaximum revenue vehicle dispatching (MRVD)
We use existing machine learning algorithms to predict the future vehicle demand of each region, then estimates the idle time periods of drivers through a queueing model for each region.
With the information of the predicted vehicle demands and estimated idle time periods of drivers, we propose two batch-based vehicle dispatching algorithms to efficiently assign suitable drivers to riders.
arXiv Detail & Related papers (2021-07-19T07:51:31Z) - Value Function is All You Need: A Unified Learning Framework for Ride
Hailing Platforms [57.21078336887961]
Large ride-hailing platforms, such as DiDi, Uber and Lyft, connect tens of thousands of vehicles in a city to millions of ride demands throughout the day.
We propose a unified value-based dynamic learning framework (V1D3) for tackling both tasks.
arXiv Detail & Related papers (2021-05-18T19:22:24Z) - H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi
Dispatch [9.35511513240868]
H-TD2 is a model-free, adaptive decision-making algorithm to coordinate a large fleet of automated taxis in a dynamic urban environment.
We derive a regret bound and design the trigger condition between the two behaviors to explicitly control the trade-off between computational complexity and the individual taxi policy's bounded sub-optimality.
Unlike recent reinforcement learning dispatch methods, this policy estimation is adaptive and robust to out-of-training domain events.
arXiv Detail & Related papers (2021-05-05T15:42:31Z) - Calibration of Human Driving Behavior and Preference Using Naturalistic
Traffic Data [5.926030548326619]
We show how the model can be inverted to estimate driver preferences from naturalistic traffic data.
One distinct advantage of our approach is the drastically reduced computational burden.
arXiv Detail & Related papers (2021-05-05T01:20:03Z) - Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement
Learning [52.2663102239029]
We present a new practical framework based on deep reinforcement learning and decision-time planning for real-world vehicle on idle-hailing platforms.
Our approach learns ride-based state-value function using a batch training algorithm with deep value.
We benchmark our algorithm with baselines in a ride-hailing simulation environment to demonstrate its superiority in improving income efficiency.
arXiv Detail & Related papers (2021-03-08T05:34:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.