EIMC: Efficient Instance-aware Multi-modal Collaborative Perception
- URL: http://arxiv.org/abs/2603.02532v1
- Date: Tue, 03 Mar 2026 02:44:36 GMT
- Title: EIMC: Efficient Instance-aware Multi-modal Collaborative Perception
- Authors: Kang Yang, Peng Wang, Lantao Li, Tianci Bu, Chen Sun, Deying Li, Yongcai Wang,
- Abstract summary: EIMC proposes an early collaborative paradigm for autonomous driving.<n>It injects lightweight collaborative voxels, transmitted by neighbor agents, into the ego's local modality-fusion step.<n>Heatmap-driven consensus protocol identifies exactly where cooperation is needed.<n> refinement fusion involves collecting the top-K most confident instances from each agent and enhancing their features.
- Score: 18.679140250964135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal collaborative perception calls for great attention to enhancing the safety of autonomous driving. However, current multi-modal approaches remain a ``local fusion to communication'' sequence, which fuses multi-modal data locally and needs high bandwidth to transmit an individual's feature data before collaborative fusion. EIMC innovatively proposes an early collaborative paradigm. It injects lightweight collaborative voxels, transmitted by neighbor agents, into the ego's local modality-fusion step, yielding compact yet informative 3D collaborative priors that tighten cross-modal alignment. Next, a heatmap-driven consensus protocol identifies exactly where cooperation is needed by computing per-pixel confidence heatmaps. Only the Top-K instance vectors located in these low-confidence, high-discrepancy regions are queried from peers, then fused via cross-attention for completion. Afterwards, we apply a refinement fusion that involves collecting the top-K most confident instances from each agent and enhancing their features using self-attention. The above instance-centric messaging reduces redundancy while guaranteeing that critical occluded objects are recovered. Evaluated on OPV2V and DAIR-V2X, EIMC attains 73.01\% AP@0.5 while reducing byte bandwidth usage by 87.98\% compared with the best published multi-modal collaborative detector. Code publicly released at https://github.com/sidiangongyuan/EIMC.
Related papers
- InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs [72.5651722107621]
InterAgent is an end-to-end framework for text-driven physics-based multi-agent humanoid control.<n>We introduce an autoregressive diffusion transformer equipped with multi-stream blocks, which decouples proprioception, exteroception, and action to cross-modal interference.<n>We also propose a novel interaction graph exteroception representation that explicitly captures fine-grained joint-to-joint spatial dependencies.
arXiv Detail & Related papers (2025-12-08T10:46:01Z) - One-Shot Secure Aggregation: A Hybrid Cryptographic Protocol for Private Federated Learning in IoT [0.0]
Hyb-Agg is a lightweight and communication-efficient secure aggregation protocol.<n>It integrates Multi-Key CKKS (MK-CKKS) homomorphic encryption with Elliptic Curve Diffie-Hellman (ECDH)-based additive masking.<n>We implement and evaluate Hyb-Agg on both high-performance and resource-constrained devices, including a Raspberry Pi 4.
arXiv Detail & Related papers (2025-11-28T15:01:26Z) - Is Discretization Fusion All You Need for Collaborative Perception? [5.44403620979893]
This paper proposes a novel Anchor-Centric paradigm for Collaborative Object detection (ACCO)<n>It avoids grid precision issues and allows more flexible and efficient anchor-centric communication and fusion.<n> Comprehensive experiments are conducted to evaluate ACCO on OPV2V and Dair-V2X datasets.
arXiv Detail & Related papers (2025-03-18T06:25:03Z) - RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception [14.450341173771486]
Radian Glue Attention (RG-Attn) is a lightweight and generalizable cross-modal fusion module.<n>RG-Attn efficiently aligns features through a radian-based attention constraint.<n>Paint-To-Puzzle (PTP) prioritizes communication efficiency but assumes all agents have a camera.<n>CoS-CoCo offers maximal flexibility, supporting any sensor setup.<n>Pyramid-RG-Attn Fusion (PRGAF) aims for peak detection accuracy with the highest computational overhead.
arXiv Detail & Related papers (2025-01-28T09:08:31Z) - What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception [52.41695608928129]
Multi-agent perception (MAP) allows autonomous systems to understand complex environments by interpreting data from multiple sources.
This paper investigates intermediate collaboration for MAP with a specific focus on exploring "good" properties of collaborative view.
We propose a novel framework named CMiMC for intermediate collaboration.
arXiv Detail & Related papers (2024-03-15T07:18:55Z) - Practical Collaborative Perception: A Framework for Asynchronous and
Multi-Agent 3D Object Detection [9.967263440745432]
Occlusion is a major challenge for LiDAR-based object detection methods.
State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach.
We devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior methods.
arXiv Detail & Related papers (2023-07-04T03:49:42Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Collaborative Mean Estimation over Intermittently Connected Networks
with Peer-To-Peer Privacy [86.61829236732744]
This work considers the problem of Distributed Mean Estimation (DME) over networks with intermittent connectivity.
The goal is to learn a global statistic over the data samples localized across distributed nodes with the help of a central server.
We study the tradeoff between collaborative relaying and privacy leakage due to the additional data sharing among nodes.
arXiv Detail & Related papers (2023-02-28T19:17:03Z) - COVINS: Visual-Inertial SLAM for Centralized Collaboration [11.65456841016608]
Collaborative SLAM enables a group of agents to simultaneously co-localize and jointly map an environment.
This article presents COVINS, a novel collaborative SLAM system, that enables multi-agent, scalable SLAM in large environments.
arXiv Detail & Related papers (2021-08-12T13:50:44Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.