Individual-Level Inverse Reinforcement Learning for Mean Field Games
- URL: http://arxiv.org/abs/2202.06401v1
- Date: Sun, 13 Feb 2022 20:35:01 GMT
- Title: Individual-Level Inverse Reinforcement Learning for Mean Field Games
- Authors: Yang Chen, Libo Zhang, Jiamou Liu and Shuyue Hu
- Abstract summary: Mean Field IRL (MFIRL) is the first dedicated IRL framework for MFGs that can handle both cooperative and non-cooperative environments.
We develop a practical algorithm effective for MFGs with unknown dynamics.
- Score: 16.79251229846642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent mean field game (MFG) formalism has enabled the application of
inverse reinforcement learning (IRL) methods in large-scale multi-agent
systems, with the goal of inferring reward signals that can explain
demonstrated behaviours of large populations. The existing IRL methods for MFGs
are built upon reducing an MFG to a Markov decision process (MDP) defined on
the collective behaviours and average rewards of the population. However, this
paper reveals that the reduction from MFG to MDP holds only for the fully
cooperative setting. This limitation invalidates existing IRL methods on MFGs
with non-cooperative environments. To measure more general behaviours in large
populations, we study the use of individual behaviours to infer ground-truth
reward functions for MFGs. We propose Mean Field IRL (MFIRL), the first
dedicated IRL framework for MFGs that can handle both cooperative and
non-cooperative environments. Based on this theoretically justified framework,
we develop a practical algorithm effective for MFGs with unknown dynamics. We
evaluate MFIRL on both cooperative and mixed cooperative-competitive scenarios
with many agents. Results demonstrate that MFIRL excels in reward recovery,
sample efficiency and robustness in the face of changing dynamics.
Related papers
- A Single Online Agent Can Efficiently Learn Mean Field Games [16.00164239349632]
Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems.
This paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples.
arXiv Detail & Related papers (2024-05-05T16:38:04Z) - Learning Discrete-Time Major-Minor Mean Field Games [61.09249862334384]
We propose a novel discrete time version of major-minor MFGs (M3FGs) and a learning algorithm based on fictitious play and partitioning the probability simplex.
M3FGs generalize MFGs with common noise and can handle not only random exogeneous environment states but also major players.
arXiv Detail & Related papers (2023-12-17T18:22:08Z) - Reinforcement Learning for SBM Graphon Games with Re-Sampling [4.6648272529750985]
We develop a novel learning framework based on a Graphon Game with Re-Sampling (GGR-S) model.
We analyze GGR-S dynamics and establish the convergence to dynamics of MP-MFG.
arXiv Detail & Related papers (2023-10-25T03:14:48Z) - On Imitation in Mean-field Games [53.27734434016737]
We explore the problem of imitation learning (IL) in the context of mean-field games (MFGs)
We show that when only the reward depends on the population distribution, IL in MFGs can be reduced to single-agent IL with similar guarantees.
We propose a new adversarial formulation where the reinforcement learning problem is replaced by a mean-field control problem.
arXiv Detail & Related papers (2023-06-26T15:58:13Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Concave Utility Reinforcement Learning: the Mean-field Game viewpoint [42.403650997341806]
Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy.
This more general paradigm invalidates the classical Bellman equations, and calls for new algorithms.
We show that CURL is a subclass of Mean-field Games (MFGs)
arXiv Detail & Related papers (2021-06-07T16:51:07Z) - Adversarial Inverse Reinforcement Learning for Mean Field Games [17.392418397388823]
Mean field games (MFGs) provide a mathematically tractable framework for modelling large-scale multi-agent systems.
This paper proposes a novel framework, Mean-Field Adversarial IRL (MF-AIRL), which is capable of tackling uncertainties in demonstrations.
arXiv Detail & Related papers (2021-04-29T21:03:49Z) - Scaling up Mean Field Games with Online Mirror Descent [55.36153467919289]
We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD)
We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions.
A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP)
arXiv Detail & Related papers (2021-02-28T21:28:36Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z) - A General Framework for Learning Mean-Field Games [10.483303456655058]
This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in games with a large population.
It then proposes value-based and policy-based reinforcement learning algorithms with smoothed policies.
Experiments on an equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO, two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning and TRPO, are both efficient and robust in the GMFG setting.
arXiv Detail & Related papers (2020-03-13T00:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.