Improving Human-AI Coordination through Online Adversarial Training and Generative Models
- URL: http://arxiv.org/abs/2504.15457v3
- Date: Wed, 25 Jun 2025 23:40:16 GMT
- Title: Improving Human-AI Coordination through Online Adversarial Training and Generative Models
- Authors: Paresh Chaudhary, Yancheng Liang, Daphne Chen, Simon S. Du, Natasha Jaques,
- Abstract summary: Generalizing to novel humans requires training on data that captures the diversity of human behaviors.<n>Adversarial training is a promising method that allows dynamic data generation and ensures that agents are robust.<n>We propose a novel strategy that combines a pretrained generative model to simulate valid cooperative agent policies with adversarial training to maximize regret.
- Score: 36.54154192505703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Being able to cooperate with new people is an important component of many economically valuable AI tasks, from household robotics to autonomous driving. However, generalizing to novel humans requires training on data that captures the diversity of human behaviors. Adversarial training is a promising method that allows dynamic data generation and ensures that agents are robust. It creates a feedback loop where the agent's performance influences the generation of new adversarial data, which can be used immediately to train the agent. However, adversarial training is difficult to apply in a cooperative task; how can we train an adversarial cooperator? We propose a novel strategy that combines a pretrained generative model to simulate valid cooperative agent policies with adversarial training to maximize regret. We call our method GOAT: Generative Online Adversarial Training. In this framework, the GOAT dynamically searches the latent space of the generative model for coordination strategies where the learning policy, the Cooperator agent, underperforms. GOAT enables better generalization by exposing the Cooperator to various challenging interaction scenarios. We maintain realistic coordination strategies by keeping the generative model frozen, thus avoiding adversarial exploitation. We evaluate GOAT with real human partners, and the results demonstrate state of the art performance on the Overcooked benchmark, highlighting its effectiveness in generalizing to diverse human behaviors.
Related papers
- Learning from Active Human Involvement through Proxy Value Propagation [44.144964115275]
Learning from active human involvement enables the human subject to actively intervene and demonstrate to the AI agent during training.<n>We propose a new reward-free active human involvement method called Proxy Value propagation for policy optimization.<n>Our method can learn to solve continuous and discrete control tasks with various human control devices, including the challenging task of driving in Grand Theft Auto V.
arXiv Detail & Related papers (2025-02-05T17:07:37Z) - Learning to Cooperate with Humans using Generative Agents [40.605931138995714]
Training agents that can coordinate zero-shot with humans is a key mission in multi-agent reinforcement learning (MARL)
We show emphlearning a generative model of human partners can effectively address this issue.
By sampling from the latent space, we can use the generative model to produce different partners to train Cooperator agents.
arXiv Detail & Related papers (2024-11-21T08:36:17Z) - Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Efficient Human-AI Coordination via Preparatory Language-based
Convention [17.840956842806975]
Existing methods for human-AI coordination typically train an agent to coordinate with a diverse set of policies or with human models fitted from real human data.
We propose employing the large language model (LLM) to develop an action plan that effectively guides both human and AI.
Our method achieves better alignment with human preferences and an average performance improvement of 15% compared to the state-of-the-art.
arXiv Detail & Related papers (2023-11-01T10:18:23Z) - Outlier Robust Adversarial Training [57.06824365801612]
We introduce Outlier Robust Adversarial Training (ORAT) in this work.
ORAT is based on a bi-level optimization formulation of adversarial training with a robust rank-based loss function.
We show that the learning objective of ORAT satisfies the $mathcalH$-consistency in binary classification, which establishes it as a proper surrogate to adversarial 0/1 loss.
arXiv Detail & Related papers (2023-09-10T21:36:38Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - A Hierarchical Approach to Population Training for Human-AI
Collaboration [20.860808795671343]
We introduce a Hierarchical Reinforcement Learning (HRL) based method for Human-AI Collaboration.
We demonstrate that our method is able to dynamically adapt to novel partners of different play styles and skill levels in the 2-player collaborative Overcooked game environment.
arXiv Detail & Related papers (2023-05-26T07:53:12Z) - Learning to Influence Human Behavior with Offline Reinforcement Learning [70.7884839812069]
We focus on influence in settings where there is a need to capture human suboptimality.
Experiments online with humans is potentially unsafe, and creating a high-fidelity simulator of the environment is often impractical.
We show that offline reinforcement learning can learn to effectively influence suboptimal humans by extending and combining elements of observed human-human behavior.
arXiv Detail & Related papers (2023-03-03T23:41:55Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Coordination with Humans via Strategy Matching [5.072077366588174]
We present an algorithm for autonomously recognizing available task-completion strategies by observing human-human teams performing a collaborative task.
By transforming team actions into low dimensional representations using hidden Markov models, we can identify strategies without prior knowledge.
Robot policies are learned on each of the identified strategies to construct a Mixture-of-Experts model that adapts to the task strategies of unseen human partners.
arXiv Detail & Related papers (2022-10-27T01:00:50Z) - Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time.
We propose a novel approach to address the difficulties of scalability and data scarcity.
Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z) - Behaviour-conditioned policies for cooperative reinforcement learning
tasks [41.74498230885008]
In various real-world tasks, an agent needs to cooperate with unknown partner agent types.
Deep reinforcement learning models can be trained to deliver the required functionality but are known to suffer from sample inefficiency and slow learning.
We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour.
We additionally suggest an agent architecture, which can efficiently use the generated data and gain the meta-learning capability.
arXiv Detail & Related papers (2021-10-04T09:16:41Z) - Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration [51.268988527778276]
We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations.
Our method co-optimizes a human policy and a robot policy in an interactive learning process.
arXiv Detail & Related papers (2021-08-13T03:14:43Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.