Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration
for Mean-Field Reinforcement Learning
- URL: http://arxiv.org/abs/2006.11917v1
- Date: Sun, 21 Jun 2020 21:45:50 GMT
- Title: Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration
for Mean-Field Reinforcement Learning
- Authors: Lingxiao Wang, Zhuoran Yang, Zhaoran Wang
- Abstract summary: We exploit the symmetry of agents in multi-agent reinforcement learning (MARL)
We propose MF-FQI algorithm that solves the mean-field MARL and establishes a non-asymptotic analysis for MF-FQI algorithm.
We highlight that MF-FQI algorithm enjoys a "blessing of many agents" property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.
- Score: 135.64775986546505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent reinforcement learning (MARL) achieves significant empirical
successes. However, MARL suffers from the curse of many agents. In this paper,
we exploit the symmetry of agents in MARL. In the most generic form, we study a
mean-field MARL problem. Such a mean-field MARL is defined on mean-field
states, which are distributions that are supported on continuous space. Based
on the mean embedding of the distributions, we propose MF-FQI algorithm that
solves the mean-field MARL and establishes a non-asymptotic analysis for MF-FQI
algorithm. We highlight that MF-FQI algorithm enjoys a "blessing of many
agents" property in the sense that a larger number of observed agents improves
the performance of MF-FQI algorithm.
Related papers
- Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning [37.80275600302316]
distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL.
A notorious yet open challenge is if RMGs can escape the curse of multiagency.
This is the first algorithm to break the curse of multiagency for RMGs.
arXiv Detail & Related papers (2024-09-30T08:09:41Z) - Decentralized Monte Carlo Tree Search for Partially Observable
Multi-agent Pathfinding [49.730902939565986]
Multi-Agent Pathfinding problem involves finding a set of conflict-free paths for a group of agents confined to a graph.
In this study, we focus on the decentralized MAPF setting, where the agents may observe the other agents only locally.
We propose a decentralized multi-agent Monte Carlo Tree Search (MCTS) method for MAPF tasks.
arXiv Detail & Related papers (2023-12-26T06:57:22Z) - Robust Multi-Agent Reinforcement Learning with State Uncertainty [17.916400875478377]
We study the problem of MARL with state uncertainty in this work.
We propose a robust multi-agent Q-learning algorithm to find such an equilibrium.
Our experiments show that the proposed RMAQ algorithm converges to the optimal value function.
arXiv Detail & Related papers (2023-07-30T12:31:42Z) - Major-Minor Mean Field Multi-Agent Reinforcement Learning [29.296206774925388]
Multi-agent reinforcement learning (MARL) remains difficult to scale to many agents.
Recent MARL using Mean Field Control (MFC) provides a tractable and rigorous approach to otherwise difficult cooperative MARL.
We generalize MFC to instead simultaneously model many similar and few complex agents.
arXiv Detail & Related papers (2023-03-19T14:12:57Z) - Relational Reasoning via Set Transformers: Provable Efficiency and
Applications to MARL [154.13105285663656]
A cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications.
Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works.
We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents.
arXiv Detail & Related papers (2022-09-20T16:42:59Z) - Unitary Approximate Message Passing for Matrix Factorization [90.84906091118084]
We consider matrix factorization (MF) with certain constraints, which finds wide applications in various areas.
We develop a Bayesian approach to MF with an efficient message passing implementation, called UAMPMF.
We show that UAMPMF significantly outperforms state-of-the-art algorithms in terms of recovery accuracy, robustness and computational complexity.
arXiv Detail & Related papers (2022-07-31T12:09:32Z) - The Multi-Agent Pickup and Delivery Problem: MAPF, MARL and Its
Warehouse Applications [2.969705152497174]
We study two state-of-the-art solutions to the multi-agent pickup and delivery problem based on different principles.
Specifically, a recent MAPF algorithm called conflict-based search (CBS) and a current MARL algorithm called shared experience actor-critic (SEAC) are studied.
arXiv Detail & Related papers (2022-03-14T13:23:35Z) - Settling the Variance of Multi-Agent Policy Gradients [14.558011059649543]
Policy gradient (PG) methods are popular reinforcement learning (RL) methods.
In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG methods degrades as the variance of gradient estimates increases rapidly with the number of agents.
We offer a rigorous analysis of MAPG methods by quantifying the contributions of the number of agents and agents' explorations to the variance of MAPG estimators.
We propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL.
arXiv Detail & Related papers (2021-08-19T10:49:10Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Positive Semidefinite Matrix Factorization: A Connection with Phase
Retrieval and Affine Rank Minimization [71.57324258813674]
We show that PSDMF algorithms can be designed based on phase retrieval (PR) and affine rank minimization (ARM) algorithms.
Motivated by this idea, we introduce a new family of PSDMF algorithms based on iterative hard thresholding (IHT)
arXiv Detail & Related papers (2020-07-24T06:10:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.