Generating Teammates for Training Robust Ad Hoc Teamwork Agents via
Best-Response Diversity
- URL: http://arxiv.org/abs/2207.14138v3
- Date: Wed, 24 May 2023 13:54:53 GMT
- Title: Generating Teammates for Training Robust Ad Hoc Teamwork Agents via
Best-Response Diversity
- Authors: Arrasy Rahman, Elliot Fosong, Ignacio Carlucho, Stefano V. Albrecht
- Abstract summary: Ad hoc teamwork (AHT) is the challenge of designing a robust learner agent that effectively collaborates with unknown teammates.
Early approaches address the AHT challenge by training the learner with a diverse set of handcrafted teammate policies.
Recent approaches attempted to improve the robustness of the learner by training it with teammate policies generated by optimising information-theoretic diversity metrics.
- Score: 6.940758395823777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ad hoc teamwork (AHT) is the challenge of designing a robust learner agent
that effectively collaborates with unknown teammates without prior coordination
mechanisms. Early approaches address the AHT challenge by training the learner
with a diverse set of handcrafted teammate policies, usually designed based on
an expert's domain knowledge about the policies the learner may encounter.
However, implementing teammate policies for training based on domain knowledge
is not always feasible. In such cases, recent approaches attempted to improve
the robustness of the learner by training it with teammate policies generated
by optimising information-theoretic diversity metrics. The problem with
optimising existing information-theoretic diversity metrics for teammate policy
generation is the emergence of superficially different teammates. When used for
AHT training, superficially different teammate behaviours may not improve a
learner's robustness during collaboration with unknown teammates. In this
paper, we present an automated teammate policy generation method optimising the
Best-Response Diversity (BRDiv) metric, which measures diversity based on the
compatibility of teammate policies in terms of returns. We evaluate our
approach in environments with multiple valid coordination strategies, comparing
against methods optimising information-theoretic diversity metrics and an
ablation not optimising any diversity metric. Our experiments indicate that
optimising BRDiv yields a diverse set of training teammate policies that
improve the learner's performance relative to previous teammate generation
approaches when collaborating with near-optimal previously unseen teammate
policies.
Related papers
- Online Policy Distillation with Decision-Attention [23.807761525617384]
Policy Distillation (PD) has become an effective method to improve deep reinforcement learning tasks.
We study the knowledge transfer between different policies that can learn diverse knowledge from the same environment.
We propose Online Policy Distillation (OPD) with Decision-Attention (DA)
arXiv Detail & Related papers (2024-06-08T14:40:53Z) - Symmetry-Breaking Augmentations for Ad Hoc Teamwork [10.014956508924842]
In many collaborative settings, artificial intelligence (AI) agents must be able to adapt to new teammates that use unknown or previously unobserved strategies.
We introduce symmetry-breaking augmentations (SBA), which increases diversity in the behaviour of training teammates by applying a symmetry-flipping operation.
We demonstrate this experimentally in two settings, and show that our approach improves upon previous ad hoc teamwork results in the challenging card game Hanabi.
arXiv Detail & Related papers (2024-02-15T14:49:28Z) - Improving Generalization of Alignment with Human Preferences through
Group Invariant Learning [56.19242260613749]
Reinforcement Learning from Human Feedback (RLHF) enables the generation of responses more aligned with human preferences.
Previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples.
We propose a novel approach that can learn a consistent policy via RL across various data groups or domains.
arXiv Detail & Related papers (2023-10-18T13:54:15Z) - Policy Diversity for Cooperative Agents [8.689289576285095]
Multi-agent reinforcement learning aims to find the optimal team cooperative policy to complete a task.
There may exist multiple different ways of cooperating, which usually are very needed by domain experts.
Unfortunately, there is a general lack of effective policy diversity approaches specifically designed for the multi-agent domain.
arXiv Detail & Related papers (2023-08-28T05:23:16Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents [39.19326531319873]
Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse teammate policies.
We introduce the L-BRDiv algorithm that generates a set of teammate policies that, when used for AHT training, encourage agents to emulate policies from the MCS.
We empirically demonstrate that L-BRDiv produces more robust AHT agents than state-of-the-art methods in a broader range of two-player cooperative problems.
arXiv Detail & Related papers (2023-08-18T14:45:22Z) - Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies.
We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.
We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z) - A Reinforcement Learning-assisted Genetic Programming Algorithm for Team
Formation Problem Considering Person-Job Matching [70.28786574064694]
A reinforcement learning-assisted genetic programming algorithm (RL-GP) is proposed to enhance the quality of solutions.
The hyper-heuristic rules obtained through efficient learning can be utilized as decision-making aids when forming project teams.
arXiv Detail & Related papers (2023-04-08T14:32:12Z) - Combating Exacerbated Heterogeneity for Robust Models in Federated
Learning [91.88122934924435]
Combination of adversarial training and federated learning can lead to the undesired robustness deterioration.
We propose a novel framework called Slack Federated Adversarial Training (SFAT)
We verify the rationality and effectiveness of SFAT on various benchmarked and real-world datasets.
arXiv Detail & Related papers (2023-03-01T06:16:15Z) - Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time.
We propose a novel approach to address the difficulties of scalability and data scarcity.
Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z) - Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model
Distillation Approach [55.83558520598304]
We propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation.
We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge.
Our proposed framework, namely Learning and Teaching Categorical Reinforcement, shows promising performance on stabilizing and accelerating learning progress.
arXiv Detail & Related papers (2020-02-06T11:31:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.