Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents
- URL: http://arxiv.org/abs/2308.09595v2
- Date: Wed, 3 Jan 2024 03:05:25 GMT
- Title: Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents
- Authors: Arrasy Rahman, Jiaxun Cui, Peter Stone
- Abstract summary: Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse teammate policies.
We introduce the L-BRDiv algorithm that generates a set of teammate policies that, when used for AHT training, encourage agents to emulate policies from the MCS.
We empirically demonstrate that L-BRDiv produces more robust AHT agents than state-of-the-art methods in a broader range of two-player cooperative problems.
- Score: 39.19326531319873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robustly cooperating with unseen agents and human partners presents
significant challenges due to the diverse cooperative conventions these
partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this
challenge by training an agent with a population of diverse teammate policies
obtained through maximizing specific diversity metrics. However, prior
heuristic-based diversity metrics do not always maximize the agent's robustness
in all cooperative problems. In this work, we first propose that maximizing an
AHT agent's robustness requires it to emulate policies in the minimum coverage
set (MCS), the set of best-response policies to any partner policies in the
environment. We then introduce the L-BRDiv algorithm that generates a set of
teammate policies that, when used for AHT training, encourage agents to emulate
policies from the MCS. L-BRDiv works by solving a constrained optimization
problem to jointly train teammate policies for AHT training and approximating
AHT agent policies that are members of the MCS. We empirically demonstrate that
L-BRDiv produces more robust AHT agents than state-of-the-art methods in a
broader range of two-player cooperative problems without the need for extensive
hyperparameter tuning for its objectives. Our study shows that L-BRDiv
outperforms the baseline methods by prioritizing discovering distinct members
of the MCS instead of repeatedly finding redundant policies.
Related papers
- Policy Diversity for Cooperative Agents [8.689289576285095]
Multi-agent reinforcement learning aims to find the optimal team cooperative policy to complete a task.
There may exist multiple different ways of cooperating, which usually are very needed by domain experts.
Unfortunately, there is a general lack of effective policy diversity approaches specifically designed for the multi-agent domain.
arXiv Detail & Related papers (2023-08-28T05:23:16Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning [90.43925357575543]
We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability.
RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
arXiv Detail & Related papers (2022-10-18T07:32:43Z) - Developing cooperative policies for multi-stage reinforcement learning
tasks [0.0]
Many hierarchical reinforcement learning algorithms utilise a series of independent skills as a basis to solve tasks at a higher level of reasoning.
This paper proposes the Cooperative Consecutive Policies (CCP) method of enabling consecutive agents to cooperatively solve long time horizon multi-stage tasks.
arXiv Detail & Related papers (2022-05-11T01:31:04Z) - Iterated Reasoning with Mutual Information in Cooperative and Byzantine
Decentralized Teaming [0.0]
We show that reformulating an agent's policy to be conditional on the policies of its teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG)
Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks.
arXiv Detail & Related papers (2022-01-20T22:54:32Z) - Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning [25.027143431992755]
Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks.
Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply.
In this paper, we extend the theory of trust region learning to MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme.
Based on these, we develop Heterogeneous-Agent Trust Region Policy optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy optimisation (
arXiv Detail & Related papers (2021-09-23T09:44:35Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Developing cooperative policies for multi-stage tasks [0.0]
This paper proposes the Cooperative Soft Actor Critic (CSAC) method of enabling consecutive reinforcement learning agents to cooperatively solve a long time horizon multi-stage task.
CSAC achieved a success rate of at least 20% higher than the uncooperative policies, and converged on a solution at least 4 times faster than the single agent.
arXiv Detail & Related papers (2020-07-01T03:32:14Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.