Related papers: PADiff: Predictive and Adaptive Diffusion Policies for Ad Hoc Teamwork

PADiff: Predictive and Adaptive Diffusion Policies for Ad Hoc Teamwork

URL: http://arxiv.org/abs/2511.07260v1
Date: Mon, 10 Nov 2025 16:05:40 GMT
Title: PADiff: Predictive and Adaptive Diffusion Policies for Ad Hoc Teamwork
Authors: Hohei Chan, Xinzhi Zhang, Antao Xiang, Weinan Zhang, Mengchen Zhao,
Abstract summary: Ad hoc teamwork (AHT) requires agents to collaborate with previously unseen teammates, which is crucial for many real-world applications.<n> Conventional RL-based approaches optimize a single expected return, which often causes policies to collapse into a single dominant behavior.<n>We introduce PADiff, a diffusion-based approach that captures agent's multimodal behaviors, unlocking its diverse cooperation modes with teammates.
Score: 19.386340680474955
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ad hoc teamwork (AHT) requires agents to collaborate with previously unseen teammates, which is crucial for many real-world applications. The core challenge of AHT is to develop an ego agent that can predict and adapt to unknown teammates on the fly. Conventional RL-based approaches optimize a single expected return, which often causes policies to collapse into a single dominant behavior, thus failing to capture the multimodal cooperation patterns inherent in AHT. In this work, we introduce PADiff, a diffusion-based approach that captures agent's multimodal behaviors, unlocking its diverse cooperation modes with teammates. However, standard diffusion models lack the ability to predict and adapt in highly non-stationary AHT scenarios. To address this limitation, we propose a novel diffusion-based policy that integrates critical predictive information about teammates into the denoising process. Extensive experiments across three cooperation environments demonstrate that PADiff outperforms existing AHT methods significantly.

Related papers

Heterogeneous Agent Collaborative Reinforcement Learning [52.99813668995983]
Heterogeneous Agent Collaborative Reinforcement Learning (HACRL)<n>Building on this paradigm, we propose HACPO, a collaborative RL algorithm that enables principled rollout sharing to maximize sample utilization and cross-agent knowledge transfer.<n>Experiments across diverse heterogeneous model combinations and reasoning benchmarks show that HACPO consistently improves all participating agents, outperforming GSPO by an average of 3.3% while using only half the rollout cost.
arXiv Detail & Related papers (2026-03-03T05:09:49Z)
Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies [51.24079409973799]
Diffusion-based generative models are well-positioned to meet the needs of online Multi-Agent Reinforcement Learning (MARL)<n>We propose among the first underlineOnline off-policy underlineMARL framework using underlineDiffusion policies to orchestrate coordination.<n>Our key innovation is a relaxed policy objective that maximizes scaled joint entropy, facilitating effective exploration without relying on tractable likelihood.
arXiv Detail & Related papers (2026-02-20T15:38:02Z)
Multi-Agent Conditional Diffusion Model with Mean Field Communication as Wireless Resource Allocation Planner [16.759740918605768]
In wireless communication systems, efficient and adaptive resource allocation plays a crucial role in enhancing Quality of Service (QoS)<n>In contrast, the Distributed Training with Decentralized Execution (DTDE) paradigm enables distributed learning and decision-making.<n>We propose the Multi-Agent Conditional Diffusion Model Planner (MACDMP) for decentralized communication resource management.
arXiv Detail & Related papers (2025-10-27T03:42:18Z)
ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork [24.374221820972707]
Learning to collaborate with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT)<n>This paper presents a unified framework for AHT by reformulating the problem as an open-ended learning process between an AHT agent and an adversarial teammate generator.<n> Experiments across diverse two-player environments demonstrate that ROTATE significantly outperforms baselines at generalizing to an unseen set of evaluation teammates.
arXiv Detail & Related papers (2025-05-29T17:24:54Z)
HyperMARL: Adaptive Hypernetworks for Multi-Agent RL [13.029350832809582]
Multi-agent reinforcement learning (MARL) requires policies to express homogeneous, specialised, or mixed behaviours.<n>We propose a solution built on a key insight: an agent-conditioned hypernetwork can generate agent-specific parameters and decouple observation- and agent-conditioned gradients.<n>Our resulting method, HyperMARL, avoids the complexities of prior work and empirically reduces policy gradient variance.
arXiv Detail & Related papers (2024-12-05T15:09:51Z)
MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models. We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z)
Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments [3.0284592792243794]
Bottom Up Network (BUN) treats the collective of multi-agents as a unified entity. Our empirical evaluations across a variety of cooperative multi-agent scenarios, including tasks such as cooperative navigation and traffic control, consistently demonstrate BUN's superiority over baseline methods with substantially reduced computational costs.
arXiv Detail & Related papers (2024-10-03T14:25:02Z)
ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
Decentralized Adaptive Formation via Consensus-Oriented Multi-Agent Communication [9.216867817261493]
We propose a novel Consensus-based Decentralized Adaptive Formation (Cons-DecAF) framework. Specifically, we develop a novel multi-agent reinforcement learning method, Consensus-oriented Multi-Agent Communication (ConsMAC) Instead of pre-assigning specific positions of agents, we employ a displacement-based formation by Hausdorff distance to significantly improve the formation efficiency.
arXiv Detail & Related papers (2023-07-23T10:41:17Z)
MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
MADiff is a diffusion-based multi-agent learning framework.<n>It works as both a decentralized policy and a centralized controller.<n>Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z)
MA-Dreamer: Coordination and communication through shared imagination [5.253168177256072]
We present MA-Dreamer, a model-based method that uses both agent-centric and global differentiable models of the environment. Our experiments show that in long-term speaker-listener tasks and in cooperative games with strong partial-observability, MA-Dreamer finds a solution that makes effective use of coordination.
arXiv Detail & Related papers (2022-04-10T13:54:26Z)
Multi-Agent Interactions Modeling with Correlated Policies [53.38338964628494]
In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework. We develop a Decentralized Adrial Imitation Learning algorithm with Correlated policies (CoDAIL) Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators.
arXiv Detail & Related papers (2020-01-04T17:31:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.