Related papers: Symmetry-Breaking Augmentations for Ad Hoc Teamwork

Symmetry-Breaking Augmentations for Ad Hoc Teamwork

URL: http://arxiv.org/abs/2402.09984v2
Date: Sat, 19 Apr 2025 14:12:05 GMT
Title: Symmetry-Breaking Augmentations for Ad Hoc Teamwork
Authors: Ravi Hammond, Dustin Craggs, Mingyu Guo, Jakob Foerster, Ian Reid,
Abstract summary: We introduce symmetry-breaking augmentations (SBA) as a novel approach to this challenge.<n>By applying a symmetry-flipping operation to increase behavioural diversity among training teammates, SBA encourages agents to learn robust responses to unknown strategies.<n>Our findings provide insights into how AI systems can better adapt to diverse human conventions and the core mechanics of alignment.
Score: 9.334943633357065
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In dynamic collaborative settings, for artificial intelligence (AI) agents to better align with humans, they must adapt to novel teammates who utilise unforeseen strategies. While adaptation is often simple for humans, it can be challenging for AI agents. Our work introduces symmetry-breaking augmentations (SBA) as a novel approach to this challenge. By applying a symmetry-flipping operation to increase behavioural diversity among training teammates, SBA encourages agents to learn robust responses to unknown strategies, highlighting how social conventions impact human-AI alignment. We demonstrate this experimentally in two settings, showing that our approach outperforms previous ad hoc teamwork results in the challenging card game Hanabi. In addition, we propose a general metric for estimating symmetry dependency amongst a given set of policies. Our findings provide insights into how AI systems can better adapt to diverse human conventions and the core mechanics of alignment.

Related papers

Improving Human-AI Coordination through Adversarial Training and Generative Models [36.54154192505703]
Generalizing to novel humans requires training on data that captures the diversity of human behaviors. Adversarial training is one avenue for searching for such data and ensuring that agents are robust. We propose a novel strategy for overcoming self-sabotage that combines a pre-trained generative model to simulate valid cooperative agent policies.
arXiv Detail & Related papers (2025-04-21T21:53:00Z)
Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent. We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents. Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z)
Optimizing Risk-averse Human-AI Hybrid Teams [1.433758865948252]
We propose a manager which learns, through a standard Reinforcement Learning scheme, how to best delegate. We demonstrate the optimality of our manager's performance in several grid environments. Our results show our manager can successfully learn desirable delegations which result in team paths near/exactly optimal.
arXiv Detail & Related papers (2024-03-13T09:49:26Z)
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning [56.19242260613749]
Reinforcement Learning from Human Feedback (RLHF) enables the generation of responses more aligned with human preferences. Previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples. We propose a novel approach that can learn a consistent policy via RL across various data groups or domains.
arXiv Detail & Related papers (2023-10-18T13:54:15Z)
ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi [15.917861586043813]
We show that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different learning methods. We create a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability of MARL methods.
arXiv Detail & Related papers (2023-08-20T14:44:50Z)
Mastering Asymmetrical Multiplayer Game with Multi-Agent Asymmetric-Evolution Reinforcement Learning [8.628547849796615]
Asymmetrical multiplayer (AMP) game is a popular game genre which involves multiple types of agents competing or collaborating in the game. It is difficult to train powerful agents that can defeat top human players in AMP games by typical self-play training method because of unbalancing characteristics in their asymmetrical environments. We propose asymmetric-evolution training (AET), a novel multi-agent reinforcement learning framework that can train multiple kinds of agents simultaneously in AMP game.
arXiv Detail & Related papers (2023-04-20T07:14:32Z)
On-the-fly Strategy Adaptation for ad-hoc Agent Coordination [21.029009561094725]
Training agents in cooperative settings offers the promise of AI agents able to interact effectively with humans (and other agents) in the real world. The vast majority of focus has been on the self-play paradigm. This paper proposes to solve this problem by adapting agent strategies on the fly, using a posterior belief over the other agents' strategy.
arXiv Detail & Related papers (2022-03-08T02:18:11Z)
Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time. We propose a novel approach to address the difficulties of scalability and data scarcity. Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z)
Policy Fusion for Adaptive and Customizable Reinforcement Learning Agents [137.86426963572214]
We show how to combine distinct behavioral policies to obtain a meaningful "fusion" policy. We propose four different policy fusion methods for combining pre-trained policies. We provide several practical examples and use-cases for how these methods are indeed useful for video game production and designers.
arXiv Detail & Related papers (2021-04-21T16:08:44Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge. CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z)
Moody Learners -- Explaining Competitive Behaviour of Reinforcement Learning Agents [65.2200847818153]
In a competitive scenario, the agent does not only have a dynamic environment but also is directly affected by the opponents' actions. Observing the Q-values of the agent is usually a way of explaining its behavior, however, do not show the temporal-relation between the selected actions.
arXiv Detail & Related papers (2020-07-30T11:30:42Z)
Natural Emergence of Heterogeneous Strategies in Artificially Intelligent Competitive Teams [0.0]
We develop a competitive multi agent environment called FortAttack in which two teams compete against each other. We observe a natural emergence of heterogeneous behavior amongst homogeneous agents when such behavior can lead to the team's success. We propose ensemble training, in which we utilize the evolved opponent strategies to train a single policy for friendly agents.
arXiv Detail & Related papers (2020-07-06T22:35:56Z)
Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game. We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z)
"Other-Play" for Zero-Shot Coordination [21.607428852157273]
Other-play learning algorithm enhances self-play by looking for more robust strategies. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents.
arXiv Detail & Related papers (2020-03-06T00:39:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.