Related papers: Nested Training for Mutual Adaptation in Human-AI Teaming

Nested Training for Mutual Adaptation in Human-AI Teaming

URL: http://arxiv.org/abs/2602.17737v1
Date: Wed, 18 Feb 2026 23:07:48 GMT
Title: Nested Training for Mutual Adaptation in Human-AI Teaming
Authors: Upasana Biswas, Durgesh Kalwar, Subbarao Kambhampati, Sarath Sreedharan,
Abstract summary: Existing approaches aim to improve diversity in training partners to approximate human behavior, but these partners are static and fail to capture adaptive behavior of humans.<n>We model the human-robot teaming scenario as an Interactive Partially Observable Markov Decision Process (I-POMDP), explicitly modeling human adaptation as part of the state.<n>We train our method in a multi-episode, required cooperation setup in the Overcooked domain, comparing it against several baseline agents designed for human-robot teaming.
Score: 30.247046563601202
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mutual adaptation is a central challenge in human--AI teaming, as humans naturally adjust their strategies in response to a robot's policy. Existing approaches aim to improve diversity in training partners to approximate human behavior, but these partners are static and fail to capture adaptive behavior of humans. Exposing robots to adaptive behaviors is critical, yet when both agents learn simultaneously in a multi-agent setting, they often converge to opaque implicit coordination strategies that only work with the agents they were co-trained with. Such agents fail to generalize when paired with new partners. In order to capture the adaptive behavior of humans, we model the human-robot teaming scenario as an Interactive Partially Observable Markov Decision Process (I-POMDP), explicitly modeling human adaptation as part of the state. We propose a nested training regime to approximately learn the solution to a finite-level I-POMDP. In this framework, agents at each level are trained against adaptive agents from the level below. This ensures that the ego agent is exposed to adaptive behavior during training while avoiding the emergence of implicit coordination strategies, since the training partners are not themselves learning. We train our method in a multi-episode, required cooperation setup in the Overcooked domain, comparing it against several baseline agents designed for human-robot teaming. We evaluate the performance of our agent when paired with adaptive partners that were not seen during training. Our results demonstrate that our agent not only achieves higher task performance with these adaptive partners but also exhibits significantly greater adaptability during team interactions.

Related papers

Modeling Distinct Human Interaction in Web Agents [59.600507469754575]
We introduce the task of modeling human intervention to support collaborative web task execution.<n>We identify four distinct patterns of user interaction with agents -- hands-off supervision, hands-on oversight, collaborative task-solving, and full user takeover.<n>We deploy these intervention-aware models in live web navigation agents and evaluate them in a user study, finding a 26.5% increase in user-rated agent usefulness.
arXiv Detail & Related papers (2026-02-19T18:11:28Z)
Improving Human-AI Coordination through Online Adversarial Training and Generative Models [32.057874335805685]
Generalizing to novel humans requires training on data that captures the diversity of human behaviors.<n>Adversarial training is a promising method that allows dynamic data generation and ensures that agents are robust.<n>We propose a novel strategy that combines a pretrained generative model to simulate valid cooperative agent policies with adversarial training to maximize regret.
arXiv Detail & Related papers (2025-04-21T21:53:00Z)
ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
A Hierarchical Approach to Population Training for Human-AI Collaboration [20.860808795671343]
We introduce a Hierarchical Reinforcement Learning (HRL) based method for Human-AI Collaboration. We demonstrate that our method is able to dynamically adapt to novel partners of different play styles and skill levels in the 2-player collaborative Overcooked game environment.
arXiv Detail & Related papers (2023-05-26T07:53:12Z)
Learning to Influence Human Behavior with Offline Reinforcement Learning [70.7884839812069]
We focus on influence in settings where there is a need to capture human suboptimality. Experiments online with humans is potentially unsafe, and creating a high-fidelity simulator of the environment is often impractical. We show that offline reinforcement learning can learn to effectively influence suboptimal humans by extending and combining elements of observed human-human behavior.
arXiv Detail & Related papers (2023-03-03T23:41:55Z)
PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population. We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives. In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z)
Safe adaptation in multiagent competition [48.02377041620857]
In multiagent competitive scenarios, ego-agents may have to adapt to new opponents with previously unseen behaviors. As the ego-agent updates its own behavior to exploit the opponent, its own behavior could become more exploitable. We develop a safe adaptation approach in which the ego-agent is trained against a regularized opponent model.
arXiv Detail & Related papers (2022-03-14T23:53:59Z)
Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time. We propose a novel approach to address the difficulties of scalability and data scarcity. Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z)
Behaviour-conditioned policies for cooperative reinforcement learning tasks [41.74498230885008]
In various real-world tasks, an agent needs to cooperate with unknown partner agent types. Deep reinforcement learning models can be trained to deliver the required functionality but are known to suffer from sample inefficiency and slow learning. We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour. We additionally suggest an agent architecture, which can efficiently use the generated data and gain the meta-learning capability.
arXiv Detail & Related papers (2021-10-04T09:16:41Z)
On the Critical Role of Conventions in Adaptive Human-AI Collaboration [73.21967490610142]
We propose a learning framework that teases apart rule-dependent representation from convention-dependent representation. We experimentally validate our approach on three collaborative tasks varying in complexity.
arXiv Detail & Related papers (2021-04-07T02:46:19Z)
Adaptive Agent Architecture for Real-time Human-Agent Teaming [3.284216428330814]
It is critical that agents infer human intent and adapt their polices for smooth coordination. Most literature in human-agent teaming builds agents referencing a learned human model. We propose a novel adaptive agent architecture in human-model-free setting on a two-player cooperative game.
arXiv Detail & Related papers (2021-03-07T20:08:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.