Major-Minor Mean Field Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2303.10665v2
- Date: Tue, 7 May 2024 20:14:12 GMT
- Title: Major-Minor Mean Field Multi-Agent Reinforcement Learning
- Authors: Kai Cui, Christian Fabian, Anam Tahir, Heinz Koeppl,
- Abstract summary: Multi-agent reinforcement learning (MARL) remains difficult to scale to many agents.
Recent MARL using Mean Field Control (MFC) provides a tractable and rigorous approach to otherwise difficult cooperative MARL.
We generalize MFC to instead simultaneously model many similar and few complex agents.
- Score: 29.296206774925388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent reinforcement learning (MARL) remains difficult to scale to many agents. Recent MARL using Mean Field Control (MFC) provides a tractable and rigorous approach to otherwise difficult cooperative MARL. However, the strict MFC assumption of many independent, weakly-interacting agents is too inflexible in practice. We generalize MFC to instead simultaneously model many similar and few complex agents -- as Major-Minor Mean Field Control (M3FC). Theoretically, we give approximation results for finite agent control, and verify the sufficiency of stationary policies for optimality together with a dynamic programming principle. Algorithmically, we propose Major-Minor Mean Field MARL (M3FMARL) for finite agent systems instead of the limiting system. The algorithm is shown to approximate the policy gradient of the underlying M3FC MDP. Finally, we demonstrate its capabilities experimentally in various scenarios. We observe a strong performance in comparison to state-of-the-art policy gradient MARL methods.
Related papers
- Robust Multi-Agent Reinforcement Learning via Adversarial
Regularization: Theoretical Foundation and Stable Algorithms [79.61176746380718]
Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains.
MARL policies often lack robustness and are sensitive to small changes in their environment.
We show that we can gain robustness by controlling a policy's Lipschitz constant.
We propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies.
arXiv Detail & Related papers (2023-10-16T20:14:06Z) - Learning Decentralized Partially Observable Mean Field Control for
Artificial Collective Behavior [28.313779052437134]
We propose novel models for decentralized partially observable MFC (Dec-POMFC)
We provide rigorous theoretical results, including a dynamic programming principle.
Overall, our framework takes a step towards RL-based engineering of artificial collective behavior via MFC.
arXiv Detail & Related papers (2023-07-12T14:02:03Z) - Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning [25.027143431992755]
Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks.
Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply.
In this paper, we extend the theory of trust region learning to MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme.
Based on these, we develop Heterogeneous-Agent Trust Region Policy optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy optimisation (
arXiv Detail & Related papers (2021-09-23T09:44:35Z) - Settling the Variance of Multi-Agent Policy Gradients [14.558011059649543]
Policy gradient (PG) methods are popular reinforcement learning (RL) methods.
In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG methods degrades as the variance of gradient estimates increases rapidly with the number of agents.
We offer a rigorous analysis of MAPG methods by quantifying the contributions of the number of agents and agents' explorations to the variance of MAPG estimators.
We propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL.
arXiv Detail & Related papers (2021-08-19T10:49:10Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Discrete-Time Mean Field Control with Environment States [25.44061731738579]
Mean field control and mean field games have been established as a tractable solution for large-scale multi-agent problems with many agents.
We rigorously establish approximate optimality as the number of agents grows in the finite agent case.
We find that a dynamic programming principle holds, resulting in the existence of an optimal stationary policy.
arXiv Detail & Related papers (2021-04-30T10:58:01Z) - Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field
Control/Game in Continuous Time [109.06623773924737]
We study the policy gradient method for the linear-quadratic mean-field control and game.
We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
arXiv Detail & Related papers (2020-08-16T06:34:11Z) - Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration
for Mean-Field Reinforcement Learning [135.64775986546505]
We exploit the symmetry of agents in multi-agent reinforcement learning (MARL)
We propose MF-FQI algorithm that solves the mean-field MARL and establishes a non-asymptotic analysis for MF-FQI algorithm.
We highlight that MF-FQI algorithm enjoys a "blessing of many agents" property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.
arXiv Detail & Related papers (2020-06-21T21:45:50Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.