Approximate Multiagent Reinforcement Learning for On-Demand Urban Mobility Problem on a Large Map (extended version)
- URL: http://arxiv.org/abs/2311.01534v4
- Date: Tue, 18 Feb 2025 16:08:53 GMT
- Title: Approximate Multiagent Reinforcement Learning for On-Demand Urban Mobility Problem on a Large Map (extended version)
- Authors: Daniel Garces, Sushmita Bhattacharya, Dimitri Bertsekas, Stephanie Gil,
- Abstract summary: We study the autonomous multiagent taxi routing problem for a large urban environment.
Recent theory has shown that a rollout algorithm with a stable base policy produces a near-optimal stable policy.
We propose an approximate multiagent rollout-based two phase algorithm that reduces computational costs, while still achieving a stable near-optimal policy.
- Score: 8.537183852577686
- License:
- Abstract: In this paper, we focus on the autonomous multiagent taxi routing problem for a large urban environment where the location and number of future ride requests are unknown a-priori, but can be estimated by an empirical distribution. Recent theory has shown that a rollout algorithm with a stable base policy produces a near-optimal stable policy. In the routing setting, a policy is stable if its execution keeps the number of outstanding requests uniformly bounded over time. Although, rollout-based approaches are well-suited for learning cooperative multiagent policies with considerations for future demand, applying such methods to a large urban environment can be computationally expensive due to the large number of taxis required for stability. In this paper, we aim to address the computational bottleneck of multiagent rollout by proposing an approximate multiagent rollout-based two phase algorithm that reduces computational costs, while still achieving a stable near-optimal policy. Our approach partitions the graph into sectors based on the predicted demand and the maximum number of taxis that can run sequentially given the user's computational resources. The algorithm then applies instantaneous assignment (IA) for re-balancing taxis across sectors and a sector-wide multiagent rollout algorithm that is executed in parallel for each sector. We provide two main theoretical results: 1) characterize the number of taxis $m$ that is sufficient for IA to be stable; 2) derive a necessary condition on $m$ to maintain stability for IA as time goes to infinity. Our numerical results show that our approach achieves stability for an $m$ that satisfies the theoretical conditions. We also empirically demonstrate that our proposed two phase algorithm has equivalent performance to the one-at-a-time rollout over the entire map, but with significantly lower runtimes.
Related papers
- A Multiagent Path Search Algorithm for Large-Scale Coalition Structure Generation [61.08720171136229]
Coalition structure generation is a fundamental computational problem in multiagent systems.
We develop SALDAE, a multiagent path finding algorithm for CSG that operates on a graph of coalition structures.
arXiv Detail & Related papers (2025-02-14T15:21:27Z) - Online Fair Allocation of Perishable Resources [1.4952056744888913]
We consider a practically motivated variant of the canonical online fair allocation problem.
A decision-maker has a budget of perishable resources to allocate over a fixed number of rounds.
The goal is to construct a sequence of allocations that is envy-free and efficient.
arXiv Detail & Related papers (2024-06-04T15:14:10Z) - Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning [48.667697255912614]
Mean-field reinforcement learning addresses the policy of a representative agent interacting with the infinite population of identical agents.
We propose Safe-M$3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions.
Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
arXiv Detail & Related papers (2023-06-29T15:57:07Z) - H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi
Dispatch [9.35511513240868]
H-TD2 is a model-free, adaptive decision-making algorithm to coordinate a large fleet of automated taxis in a dynamic urban environment.
We derive a regret bound and design the trigger condition between the two behaviors to explicitly control the trade-off between computational complexity and the individual taxi policy's bounded sub-optimality.
Unlike recent reinforcement learning dispatch methods, this policy estimation is adaptive and robust to out-of-training domain events.
arXiv Detail & Related papers (2021-05-05T15:42:31Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z) - Equilibrium Inverse Reinforcement Learning for Ride-hailing Vehicle
Network [1.599072005190786]
We formulate the problem of passenger-vehicle matching in a sparsely connected graph.
We propose an algorithm to derive an equilibrium policy in a multi-agent environment.
arXiv Detail & Related papers (2021-02-13T03:18:44Z) - Deep Reinforcement Learning for Stock Portfolio Optimization [0.0]
We will formulate the problem such that we can apply Reinforcement Learning for the task properly.
To maintain a realistic assumption about the market, we will incorporate transaction cost and risk factor into the state as well.
We will present the end-to-end solution for the task with Minimum Variance Portfolio for stock subset selection, and Wavelet Transform for extracting multi-frequency data pattern.
arXiv Detail & Related papers (2020-12-09T10:19:12Z) - Multiagent Rollout and Policy Iteration for POMDP with Application to
Multi-Robot Repair Problems [1.6939372704265414]
We consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure.
Our methods specifically address the computational challenges of partially observable multiagent problems.
arXiv Detail & Related papers (2020-11-09T06:51:50Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z) - Public Bayesian Persuasion: Being Almost Optimal and Almost Persuasive [57.47546090379434]
We study the public persuasion problem in the general setting with: (i) arbitrary state spaces; (ii) arbitrary action spaces; (iii) arbitrary sender's utility functions.
We provide a quasi-polynomial time bi-criteria approximation algorithm for arbitrary public persuasion problems that, in specific settings, yields a QPTAS.
arXiv Detail & Related papers (2020-02-12T18:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.