Related papers: InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs

InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs

URL: http://arxiv.org/abs/2512.07410v2
Date: Fri, 12 Dec 2025 09:00:52 GMT
Title: InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs
Authors: Bin Li, Ruichi Zhang, Han Liang, Jingyan Zhang, Juze Zhang, Xin Chen, Lan Xu, Jingyi Yu, Jingya Wang,
Abstract summary: InterAgent is an end-to-end framework for text-driven physics-based multi-agent humanoid control.<n>We introduce an autoregressive diffusion transformer equipped with multi-stream blocks, which decouples proprioception, exteroception, and action to cross-modal interference.<n>We also propose a novel interaction graph exteroception representation that explicitly captures fine-grained joint-to-joint spatial dependencies.
Score: 72.5651722107621
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Humanoid agents are expected to emulate the complex coordination inherent in human social behaviors. However, existing methods are largely confined to single-agent scenarios, overlooking the physically plausible interplay essential for multi-agent interactions. To bridge this gap, we propose InterAgent, the first end-to-end framework for text-driven physics-based multi-agent humanoid control. At its core, we introduce an autoregressive diffusion transformer equipped with multi-stream blocks, which decouples proprioception, exteroception, and action to mitigate cross-modal interference while enabling synergistic coordination. We further propose a novel interaction graph exteroception representation that explicitly captures fine-grained joint-to-joint spatial dependencies to facilitate network learning. Additionally, within it we devise a sparse edge-based attention mechanism that dynamically prunes redundant connections and emphasizes critical inter-agent spatial relations, thereby enhancing the robustness of interaction modeling. Extensive experiments demonstrate that InterAgent consistently outperforms multiple strong baselines, achieving state-of-the-art performance. It enables producing coherent, physically plausible, and semantically faithful multi-agent behaviors from only text prompts. Our code and data will be released to facilitate future research.

Related papers

Diffusion Forcing for Multi-Agent Interaction Sequence Modeling [52.769202433667125]
MAGNet is a unified autoregressive diffusion framework for multi-agent motion generation.<n>It supports a wide range of interaction tasks through flexible conditioning and sampling.<n>It captures both tightly synchronized activities and loosely structured social interactions.
arXiv Detail & Related papers (2025-12-19T18:59:02Z)
Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation [39.67266918328847]
We propose Text2 framework designed to generate realistic text human-human interactions.<n>We present InterCompose, a synthesis-by-composition pipeline that aligns interaction descriptions with strong singleperson motion priors.<n>We also propose InterActor, a text-to-interaction model with word-level conditioning that preserves token-level cues.
arXiv Detail & Related papers (2025-10-07T22:41:23Z)
HeLoFusion: An Efficient and Scalable Encoder for Modeling Heterogeneous and Multi-Scale Interactions in Trajectory Prediction [11.30785902722196]
HeLoFusion is an efficient and scalable encoder for modeling heterogeneous and multi-scale agent interactions.<n>Our work demonstrates that a locality-grounded architecture, which explicitly models multi-scale and heterogeneous interactions, is a highly effective strategy for advancing motion forecasting.
arXiv Detail & Related papers (2025-09-15T09:19:41Z)
AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction [77.62279834617475]
We propose a new framework that rethinks multi-agent coordination through a sequential structure rather than a graph structure.<n>Our method focuses on two key directions: (1) Next-Agent Prediction, which selects the most suitable agent role at each step, and (2) Next-Context Selection, which enables each agent to selectively access relevant information from any previous step.
arXiv Detail & Related papers (2025-06-21T18:34:43Z)
Relation Learning and Aggregate-attention for Multi-person Motion Prediction [13.052342503276936]
Multi-person motion prediction considers not just the skeleton structures or human trajectories but also the interactions between others. Previous methods often overlook that the joints relations within an individual (intra-relation) and interactions among groups (inter-relation) are distinct types of representations. We introduce a new collaborative framework for multi-person motion prediction that explicitly modeling these relations.
arXiv Detail & Related papers (2024-11-06T07:48:30Z)
Scaling Large Language Model-based Multi-Agent Collaboration [72.8998796426346]
Recent breakthroughs in large language model-driven autonomous agents have revealed that multi-agent collaboration often surpasses each individual through collective reasoning.<n>This study explores whether the continuous addition of collaborative agents can yield similar benefits.
arXiv Detail & Related papers (2024-06-11T11:02:04Z)
Rethinking Trajectory Prediction via "Team Game" [118.59480535826094]
We present a novel formulation for multi-agent trajectory prediction, which explicitly introduces the concept of interactive group consensus. On two multi-agent settings, i.e. team sports and pedestrians, the proposed framework consistently achieves superior performance compared to existing methods.
arXiv Detail & Related papers (2022-10-17T07:16:44Z)
Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning [39.4394389642761]
We introduce a novel interactiOn Pattern disenTangling (OPT) method to disentangle the entity interactions into interaction prototypes. OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability. Experiments on single-task, multi-task and zero-shot benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts.
arXiv Detail & Related papers (2022-07-08T13:42:54Z)
Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction [97.40338982628094]
We propose a simple yet effective Unlimited Neighborhood Interaction Network (UNIN) which predicts trajectories of heterogeneous agents in multiply categories. Specifically, the proposed unlimited neighborhood interaction module generates the fused-features of all agents involved in an interaction simultaneously. A hierarchical graph attention module is proposed to obtain category-tocategory interaction and agent-to-agent interaction.
arXiv Detail & Related papers (2021-07-31T13:36:04Z)
Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions. In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems. Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.