Approximate Multiagent Reinforcement Learning for On-Demand Urban
Mobility Problem on a Large Map (extended version)
- URL: http://arxiv.org/abs/2311.01534v3
- Date: Fri, 8 Mar 2024 14:36:50 GMT
- Title: Approximate Multiagent Reinforcement Learning for On-Demand Urban
Mobility Problem on a Large Map (extended version)
- Authors: Daniel Garces, Sushmita Bhattacharya, Dimitri Bertsekas, Stephanie Gil
- Abstract summary: We study the autonomous multiagent taxi routing problem for a large urban environment.
Recent theory has shown that a rollout algorithm with a stable base policy produces a near-optimal stable policy.
We propose an approximate multiagent rollout-based two phase algorithm that reduces computational costs, while still achieving a stable near-optimal policy.
- Score: 9.32626005183482
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we focus on the autonomous multiagent taxi routing problem for
a large urban environment where the location and number of future ride requests
are unknown a-priori, but can be estimated by an empirical distribution. Recent
theory has shown that a rollout algorithm with a stable base policy produces a
near-optimal stable policy. In the routing setting, a policy is stable if its
execution keeps the number of outstanding requests uniformly bounded over time.
Although, rollout-based approaches are well-suited for learning cooperative
multiagent policies with considerations for future demand, applying such
methods to a large urban environment can be computationally expensive due to
the large number of taxis required for stability. In this paper, we aim to
address the computational bottleneck of multiagent rollout by proposing an
approximate multiagent rollout-based two phase algorithm that reduces
computational costs, while still achieving a stable near-optimal policy. Our
approach partitions the graph into sectors based on the predicted demand and
the maximum number of taxis that can run sequentially given the user's
computational resources. The algorithm then applies instantaneous assignment
(IA) for re-balancing taxis across sectors and a sector-wide multiagent rollout
algorithm that is executed in parallel for each sector. We provide two main
theoretical results: 1) characterize the number of taxis $m$ that is sufficient
for IA to be stable; 2) derive a necessary condition on $m$ to maintain
stability for IA as time goes to infinity. Our numerical results show that our
approach achieves stability for an $m$ that satisfies the theoretical
conditions. We also empirically demonstrate that our proposed two phase
algorithm has equivalent performance to the one-at-a-time rollout over the
entire map, but with significantly lower runtimes.
Related papers
- Online Fair Allocation of Perishable Resources [1.4952056744888913]
We consider a practically motivated variant of the canonical online fair allocation problem.
A decision-maker has a budget of perishable resources to allocate over a fixed number of rounds.
The goal is to construct a sequence of allocations that is envy-free and efficient.
arXiv Detail & Related papers (2024-06-04T15:14:10Z) - Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning [48.667697255912614]
Mean-field reinforcement learning addresses the policy of a representative agent interacting with the infinite population of identical agents.
We propose Safe-M$3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions.
Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
arXiv Detail & Related papers (2023-06-29T15:57:07Z) - Planning and Learning with Adaptive Lookahead [74.39132848733847]
Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation.
Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration.
We propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate.
arXiv Detail & Related papers (2022-01-28T20:26:55Z) - H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi
Dispatch [9.35511513240868]
H-TD2 is a model-free, adaptive decision-making algorithm to coordinate a large fleet of automated taxis in a dynamic urban environment.
We derive a regret bound and design the trigger condition between the two behaviors to explicitly control the trade-off between computational complexity and the individual taxi policy's bounded sub-optimality.
Unlike recent reinforcement learning dispatch methods, this policy estimation is adaptive and robust to out-of-training domain events.
arXiv Detail & Related papers (2021-05-05T15:42:31Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z) - Equilibrium Inverse Reinforcement Learning for Ride-hailing Vehicle
Network [1.599072005190786]
We formulate the problem of passenger-vehicle matching in a sparsely connected graph.
We propose an algorithm to derive an equilibrium policy in a multi-agent environment.
arXiv Detail & Related papers (2021-02-13T03:18:44Z) - Deep Reinforcement Learning for Stock Portfolio Optimization [0.0]
We will formulate the problem such that we can apply Reinforcement Learning for the task properly.
To maintain a realistic assumption about the market, we will incorporate transaction cost and risk factor into the state as well.
We will present the end-to-end solution for the task with Minimum Variance Portfolio for stock subset selection, and Wavelet Transform for extracting multi-frequency data pattern.
arXiv Detail & Related papers (2020-12-09T10:19:12Z) - Multiagent Rollout and Policy Iteration for POMDP with Application to
Multi-Robot Repair Problems [1.6939372704265414]
We consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure.
Our methods specifically address the computational challenges of partially observable multiagent problems.
arXiv Detail & Related papers (2020-11-09T06:51:50Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z) - Public Bayesian Persuasion: Being Almost Optimal and Almost Persuasive [57.47546090379434]
We study the public persuasion problem in the general setting with: (i) arbitrary state spaces; (ii) arbitrary action spaces; (iii) arbitrary sender's utility functions.
We provide a quasi-polynomial time bi-criteria approximation algorithm for arbitrary public persuasion problems that, in specific settings, yields a QPTAS.
arXiv Detail & Related papers (2020-02-12T18:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.