Related papers: Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards

Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards

URL: http://arxiv.org/abs/2510.16187v1
Date: Fri, 17 Oct 2025 19:55:25 GMT
Title: Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards
Authors: Rupal Nigam, Niket Parikh, Hamid Osooli, Mikihisa Yuasa, Jacob Heglund, Huy T. Tran,
Abstract summary: Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner.<n>We propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards.<n>We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc Teaming (GPAT), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and
Score: 0.41562334038629595
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc Teaming (GPAT), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and Overcooked. We also demonstrate our algorithm in a real-world multi-robot setting.

Related papers

Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation [19.74776726500979]
Adapting a single agent to a new multi-agent system brings challenges, necessitating adjustments across various tasks, environments, and interactions with unknown teammates and opponents.<n>We propose a more comprehensive setting, Agent Collaborative-Competitive Adaptation, which evaluates an agent to generalize across diverse scenarios.<n>In ACCA, agents adjust to task and environmental changes, collaborate with unseen teammates, and compete against unknown opponents.
arXiv Detail & Related papers (2025-06-20T03:28:18Z)
Seldonian Reinforcement Learning for Ad Hoc Teamwork [47.100080234094065]
Most offline RL algorithms return optimal policies but do not provide statistical guarantees on desirable behaviors.<n>This could generate reliability issues in safety-critical applications.<n>We propose a novel offline RL approach, inspired by Seldonian optimization.
arXiv Detail & Related papers (2025-03-05T20:37:02Z)
N-Agent Ad Hoc Teamwork [36.10108537776956]
Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. This paper formalizes the problem, and proposes the Policy Optimization with Agent Modelling (POAM) algorithm. POAM is a policy gradient, multi-agent reinforcement learning approach to the NAHT problem, that enables adaptation to diverse teammate behaviors.
arXiv Detail & Related papers (2024-04-16T17:13:08Z)
ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
A General Learning Framework for Open Ad Hoc Teamwork Using Graph-based Policy Learning [11.998708550268978]
We develop a class of solutions for open ad hoc teamwork under full and partial observability. We show that our solution can learn efficient policies in open ad hoc teamwork in fully and partially observable cases.
arXiv Detail & Related papers (2022-10-11T13:44:44Z)
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z)
Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition [88.26752130107259]
In real-world multiagent systems, agents with different capabilities may join or leave without altering the team's overarching goals. We propose COPA, a coach-player framework to tackle this problem. We 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players.
arXiv Detail & Related papers (2021-05-18T17:27:37Z)
On the Critical Role of Conventions in Adaptive Human-AI Collaboration [73.21967490610142]
We propose a learning framework that teases apart rule-dependent representation from convention-dependent representation. We experimentally validate our approach on three collaborative tasks varying in complexity.
arXiv Detail & Related papers (2021-04-07T02:46:19Z)
Competing Adaptive Networks [56.56653763124104]
We develop an algorithm for decentralized competition among teams of adaptive agents. We present an application in the decentralized training of generative adversarial neural networks.
arXiv Detail & Related papers (2021-03-29T14:42:15Z)
Multi-Agent Coordination in Adversarial Environments through Signal Mediated Strategies [37.00818384785628]
Team members can coordinate their strategies before the beginning of the game, but are unable to communicate during the playing phase of the game. In this setting, model-free RL methods are oftentimes unable to capture coordination because agents' policies are executed in a decentralized fashion. We show convergence to coordinated equilibria in cases where previous state-of-the-art multi-agent RL algorithms did not.
arXiv Detail & Related papers (2021-02-09T18:44:16Z)
Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning [11.480994804659908]
We build on graph neural networks to learn agent models and joint-action value models under varying team compositions. We empirically demonstrate that our approach successfully models the effects other agents have on the learner, leading to policies that robustly adapt to dynamic team compositions.
arXiv Detail & Related papers (2020-06-18T10:39:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.