Related papers: HeatV2X: Scalable Heterogeneous Collaborative Perception via Efficient Alignment and Interaction

HeatV2X: Scalable Heterogeneous Collaborative Perception via Efficient Alignment and Interaction

URL: http://arxiv.org/abs/2511.10211v1
Date: Fri, 14 Nov 2025 01:39:33 GMT
Title: HeatV2X: Scalable Heterogeneous Collaborative Perception via Efficient Alignment and Interaction
Authors: Yueran Zhao, Zhang Zhang, Chao Sun, Tianze Wang, Chao Yue, Nuoran Li,
Abstract summary: Vehicle-to-Everything (V2X) collaborative perception extends sensing beyond single vehicle limits through transmission.<n>Existing frameworks face two key challenges: (1) the participating agents are inherently multi-modal and heterogeneous, and (2) the collaborative framework must be scalable to accommodate new agents.<n>We propose Heterogeneous Adaptation (HeatV2X), a scalable collaborative framework.
Score: 7.171380055232685
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vehicle-to-Everything (V2X) collaborative perception extends sensing beyond single vehicle limits through transmission. However, as more agents participate, existing frameworks face two key challenges: (1) the participating agents are inherently multi-modal and heterogeneous, and (2) the collaborative framework must be scalable to accommodate new agents. The former requires effective cross-agent feature alignment to mitigate heterogeneity loss, while the latter renders full-parameter training impractical, highlighting the importance of scalable adaptation. To address these issues, we propose Heterogeneous Adaptation (HeatV2X), a scalable collaborative framework. We first train a high-performance agent based on heterogeneous graph attention as the foundation for collaborative learning. Then, we design Local Heterogeneous Fine-Tuning and Global Collaborative Fine-Tuning to achieve effective alignment and interaction among heterogeneous agents. The former efficiently extracts modality-specific differences using Hetero-Aware Adapters, while the latter employs the Multi-Cognitive Adapter to enhance cross-agent collaboration and fully exploit the fusion potential. These designs enable substantial performance improvement of the collaborative framework with minimal training cost. We evaluate our approach on the OPV2V-H and DAIR-V2X datasets. Experimental results demonstrate that our method achieves superior perception performance with significantly reduced training overhead, outperforming existing state-of-the-art approaches. Our implementation will be released soon.

Related papers

Heterogeneous Agent Collaborative Reinforcement Learning [52.99813668995983]
Heterogeneous Agent Collaborative Reinforcement Learning (HACRL)<n>Building on this paradigm, we propose HACPO, a collaborative RL algorithm that enables principled rollout sharing to maximize sample utilization and cross-agent knowledge transfer.<n>Experiments across diverse heterogeneous model combinations and reasoning benchmarks show that HACPO consistently improves all participating agents, outperforming GSPO by an average of 3.3% while using only half the rollout cost.
arXiv Detail & Related papers (2026-03-03T05:09:49Z)
Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration [49.9937230730202]
We propose Search-R2, a novel Actor-Refiner collaboration framework that enhances reasoning through targeted intervention.<n>Our approach decomposes the generation process into an Actor, which produces initial reasoning trajectories.<n>We show that Search-R2 consistently outperforms strong RAG and RL-based baselines across model scales.
arXiv Detail & Related papers (2026-02-03T15:32:09Z)
InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs [72.5651722107621]
InterAgent is an end-to-end framework for text-driven physics-based multi-agent humanoid control.<n>We introduce an autoregressive diffusion transformer equipped with multi-stream blocks, which decouples proprioception, exteroception, and action to cross-modal interference.<n>We also propose a novel interaction graph exteroception representation that explicitly captures fine-grained joint-to-joint spatial dependencies.
arXiv Detail & Related papers (2025-12-08T10:46:01Z)
Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism [14.40993352402385]
We present a novel Generative Communication mechanism (GenComm) that facilitates seamless perception across heterogeneous multi-agent systems.<n>Experiments conducted on the OPV2V-H, DAIR-V2X and V2X-Real datasets demonstrate that GenComm outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-10-22T14:15:20Z)
INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception [6.018757656052237]
Collaborative perception systems overcome single-vehicle limitations by integrating multi-agent sensory data, improving accuracy and safety.<n>Previous works proves that query-based instance-level interaction reduces bandwidth demands and manual priors, however, LiDAR-focused implementations in collaborative perception remain underdeveloped.<n>We propose INSTINCT, a novel collaborative perception framework featuring three core components: 1) a quality-aware filtering mechanism for high-quality instance feature selection; 2) a dual-branch detection routing scheme to decouple collaboration-irrelevant and collaboration-relevant instances; and 3) a Cross Agent Local Instance Fusion module to aggregate local hybrid instance features.
arXiv Detail & Related papers (2025-09-28T07:16:32Z)
You Share Beliefs, I Adapt: Progressive Heterogeneous Collaborative Perception [1.9142273925815776]
Collaborative perception enables vehicles to overcome individual perception limitations by sharing information.<n>We introduce Progressive Heterogeneous Collaborative Perception (PHCP), a novel framework that formulates the problem as few-shot unsupervised domain adaptation.<n>PHCP dynamically aligns features by self-training an adapter during inference, eliminating the need for labeled data and joint training.
arXiv Detail & Related papers (2025-09-11T09:53:20Z)
An Extensible Framework for Open Heterogeneous Collaborative Perception [58.70875361688463]
Collaborative perception aims to mitigate the limitations of single-agent perception. In this paper, we introduce a new open heterogeneous problem: how to accommodate continually emerging new heterogeneous agent types into collaborative perception. We propose HEterogeneous ALliance (HEAL), a novel collaborative perception framework.
arXiv Detail & Related papers (2024-01-25T05:55:03Z)
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors [93.38830440346783]
We propose a multi-agent framework framework that can collaboratively adjust its composition as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that framework framework can effectively deploy multi-agent groups that outperform a single agent. In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.
arXiv Detail & Related papers (2023-08-21T16:47:11Z)
Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection [9.967263440745432]
Occlusion is a major challenge for LiDAR-based object detection methods. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach. We devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior methods.
arXiv Detail & Related papers (2023-07-04T03:49:42Z)
A Co-Interactive Transformer for Joint Slot Filling and Intent Detection [61.109486326954205]
Intent detection and slot filling are two main tasks for building a spoken language understanding (SLU) system. Previous studies either model the two tasks separately or only consider the single information flow from intent to slot. We propose a Co-Interactive Transformer to consider the cross-impact between the two tasks simultaneously.
arXiv Detail & Related papers (2020-10-08T10:16:52Z)
Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.