Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism
- URL: http://arxiv.org/abs/2510.19618v3
- Date: Mon, 03 Nov 2025 02:54:49 GMT
- Title: Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism
- Authors: Junfei Zhou, Penglin Dai, Quanmin Wei, Bingyi Liu, Xiao Wu, Jianping Wang,
- Abstract summary: We present a novel Generative Communication mechanism (GenComm) that facilitates seamless perception across heterogeneous multi-agent systems.<n>Experiments conducted on the OPV2V-H, DAIR-V2X and V2X-Real datasets demonstrate that GenComm outperforms existing state-of-the-art methods.
- Score: 14.40993352402385
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Multi-agent collaboration enhances the perception capabilities of individual agents through information sharing. However, in real-world applications, differences in sensors and models across heterogeneous agents inevitably lead to domain gaps during collaboration. Existing approaches based on adaptation and reconstruction fail to support pragmatic heterogeneous collaboration due to two key limitations: (1) Intrusive retraining of the encoder or core modules disrupts the established semantic consistency among agents; and (2) accommodating new agents incurs high computational costs, limiting scalability. To address these challenges, we present a novel Generative Communication mechanism (GenComm) that facilitates seamless perception across heterogeneous multi-agent systems through feature generation, without altering the original network, and employs lightweight numerical alignment of spatial information to efficiently integrate new agents at minimal cost. Specifically, a tailored Deformable Message Extractor is designed to extract spatial message for each collaborator, which is then transmitted in place of intermediate features. The Spatial-Aware Feature Generator, utilizing a conditional diffusion model, generates features aligned with the ego agent's semantic space while preserving the spatial information of the collaborators. These generated features are further refined by a Channel Enhancer before fusion. Experiments conducted on the OPV2V-H, DAIR-V2X and V2X-Real datasets demonstrate that GenComm outperforms existing state-of-the-art methods, achieving an 81% reduction in both computational cost and parameter count when incorporating new agents. Our code is available at https://github.com/jeffreychou777/GenComm.
Related papers
- Towards Efficient Agents: A Co-Design of Inference Architecture and System [66.59916327634639]
This paper presents AgentInfer, a unified framework for end-to-end agent acceleration.<n>We decompose the problem into four synergistic components: AgentCollab, AgentSched, AgentSAM, and AgentCompress.<n>Experiments on the BrowseComp-zh and DeepDiver benchmarks demonstrate that through the synergistic collaboration of these methods, AgentInfer reduces ineffective token consumption by over 50%.
arXiv Detail & Related papers (2025-12-20T12:06:13Z) - InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs [72.5651722107621]
InterAgent is an end-to-end framework for text-driven physics-based multi-agent humanoid control.<n>We introduce an autoregressive diffusion transformer equipped with multi-stream blocks, which decouples proprioception, exteroception, and action to cross-modal interference.<n>We also propose a novel interaction graph exteroception representation that explicitly captures fine-grained joint-to-joint spatial dependencies.
arXiv Detail & Related papers (2025-12-08T10:46:01Z) - Heterogeneous Multi-Agent Reinforcement Learning with Attention for Cooperative and Scalable Feature Transformation [21.732611237889326]
Feature transformation enhances downstream task performance by generating informative features through mathematical feature crossing.<n>Recent works employ reinforcement learning to enhance traditional approaches through a more effective trial-and-error way.<n>We propose a novel heterogeneous multi-agent RL framework to enable cooperative and scalable feature transformation.
arXiv Detail & Related papers (2025-11-26T21:45:38Z) - HeatV2X: Scalable Heterogeneous Collaborative Perception via Efficient Alignment and Interaction [7.171380055232685]
Vehicle-to-Everything (V2X) collaborative perception extends sensing beyond single vehicle limits through transmission.<n>Existing frameworks face two key challenges: (1) the participating agents are inherently multi-modal and heterogeneous, and (2) the collaborative framework must be scalable to accommodate new agents.<n>We propose Heterogeneous Adaptation (HeatV2X), a scalable collaborative framework.
arXiv Detail & Related papers (2025-11-13T11:33:22Z) - Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI [1.8244641115869653]
We present Federation of Agents (FoA), a distributed orchestration framework that transforms multi-agent coordination into dynamic, capability-driven collaboration.<n>FoA introduces Versioned Capability Vectors (VCVs), machine-readable profiles that make agent capabilities searchable through semantic embeddings.<n>We show 13x improvements over single-model baselines, with clustering-enhanced laboration particularly effective for complex reasoning tasks.
arXiv Detail & Related papers (2025-09-24T14:38:06Z) - Agent4FaceForgery: Multi-Agent LLM Framework for Realistic Face Forgery Detection [108.5042835056188]
This work introduces Agent4FaceForgery to address two fundamental problems.<n>How to capture the diverse intents and iterative processes of human forgery creation.<n>How to model the complex, often adversarial, text-image interactions that accompany forgeries in social media.
arXiv Detail & Related papers (2025-09-16T01:05:01Z) - RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception [14.450341173771486]
Radian Glue Attention (RG-Attn) is a lightweight and generalizable cross-modal fusion module.<n>RG-Attn efficiently aligns features through a radian-based attention constraint.<n>Paint-To-Puzzle (PTP) prioritizes communication efficiency but assumes all agents have a camera.<n>CoS-CoCo offers maximal flexibility, supporting any sensor setup.<n>Pyramid-RG-Attn Fusion (PRGAF) aims for peak detection accuracy with the highest computational overhead.
arXiv Detail & Related papers (2025-01-28T09:08:31Z) - Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning.
In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling.
The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z) - An Extensible Framework for Open Heterogeneous Collaborative Perception [58.70875361688463]
Collaborative perception aims to mitigate the limitations of single-agent perception.
In this paper, we introduce a new open heterogeneous problem: how to accommodate continually emerging new heterogeneous agent types into collaborative perception.
We propose HEterogeneous ALliance (HEAL), a novel collaborative perception framework.
arXiv Detail & Related papers (2024-01-25T05:55:03Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.