Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic
- URL: http://arxiv.org/abs/2002.10525v1
- Date: Mon, 24 Feb 2020 20:30:45 GMT
- Title: Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic
- Authors: Wonseok Jeon, Paul Barde, Derek Nowrouzezahrai, Joelle Pineau
- Abstract summary: Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
- Score: 54.2180984002807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent
approach that applies single-agent AIRL to multi-agent problems where we seek
to recover both policies for our agents and reward functions that promote
expert-like behavior. While MA-AIRL has promising results on cooperative and
competitive tasks, it is sample-inefficient and has only been validated
empirically for small numbers of agents -- its ability to scale to many agents
remains an open question. We propose a multi-agent inverse RL algorithm that is
more sample-efficient and scalable than previous works. Specifically, we employ
multi-agent actor-attention-critic (MAAC) -- an off-policy multi-agent RL
(MARL) method -- for the RL inner loop of the inverse RL procedure. In doing
so, we are able to increase sample efficiency compared to state-of-the-art
baselines, across both small- and large-scale tasks. Moreover, the RL agents
trained on the rewards recovered by our method better match the experts than
those trained on the rewards derived from the baselines. Finally, our method
requires far fewer agent-environment interactions, particularly as the number
of agents increases.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.