Adversarial Inverse Reinforcement Learning for Mean Field Games
- URL: http://arxiv.org/abs/2104.14654v5
- Date: Mon, 17 Apr 2023 23:06:52 GMT
- Title: Adversarial Inverse Reinforcement Learning for Mean Field Games
- Authors: Yang Chen, Libo Zhang, Jiamou Liu and Michael Witbrock
- Abstract summary: Mean field games (MFGs) provide a mathematically tractable framework for modelling large-scale multi-agent systems.
This paper proposes a novel framework, Mean-Field Adversarial IRL (MF-AIRL), which is capable of tackling uncertainties in demonstrations.
- Score: 17.392418397388823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mean field games (MFGs) provide a mathematically tractable framework for
modelling large-scale multi-agent systems by leveraging mean field theory to
simplify interactions among agents. It enables applying inverse reinforcement
learning (IRL) to predict behaviours of large populations by recovering reward
signals from demonstrated behaviours. However, existing IRL methods for MFGs
are powerless to reason about uncertainties in demonstrated behaviours of
individual agents. This paper proposes a novel framework, Mean-Field
Adversarial IRL (MF-AIRL), which is capable of tackling uncertainties in
demonstrations. We build MF-AIRL upon maximum entropy IRL and a new equilibrium
concept. We evaluate our approach on simulated tasks with imperfect
demonstrations. Experimental results demonstrate the superiority of MF-AIRL
over existing methods in reward recovery.
Related papers
- FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL [19.236153474365747]
Existing MARL approaches often rely on the restrictive assumption that the number of entities remains constant between training and inference.
In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization.
We propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods.
arXiv Detail & Related papers (2024-10-21T10:57:45Z) - Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations.
We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games.
Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - A Single Online Agent Can Efficiently Learn Mean Field Games [16.00164239349632]
Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems.
This paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples.
arXiv Detail & Related papers (2024-05-05T16:38:04Z) - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - Mimicking Better by Matching the Approximate Action Distribution [48.95048003354255]
We introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations.
We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
arXiv Detail & Related papers (2023-06-16T12:43:47Z) - Distributional Reward Estimation for Effective Multi-Agent Deep
Reinforcement Learning [19.788336796981685]
We propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL)
Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training.
The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
arXiv Detail & Related papers (2022-10-14T08:31:45Z) - Individual-Level Inverse Reinforcement Learning for Mean Field Games [16.79251229846642]
Mean Field IRL (MFIRL) is the first dedicated IRL framework for MFGs that can handle both cooperative and non-cooperative environments.
We develop a practical algorithm effective for MFGs with unknown dynamics.
arXiv Detail & Related papers (2022-02-13T20:35:01Z) - Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations
and Alternative Solution Concepts [0.0]
Multi-agent inverse reinforcement learning can be used to learn reward functions from agents in social environments.
To model realistic social dynamics, MIRL methods must account for suboptimal human reasoning and behavior.
arXiv Detail & Related papers (2021-09-02T19:15:29Z) - Residual Reinforcement Learning from Demonstrations [51.56457466788513]
Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal.
We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations.
Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning.
arXiv Detail & Related papers (2021-06-15T11:16:49Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.