Related papers: V2X-DSC: Multi-Agent Collaborative Perception with Distributed Source Coding Guided Communication

V2X-DSC: Multi-Agent Collaborative Perception with Distributed Source Coding Guided Communication

URL: http://arxiv.org/abs/2602.00687v1
Date: Sat, 31 Jan 2026 12:16:58 GMT
Title: V2X-DSC: Multi-Agent Collaborative Perception with Distributed Source Coding Guided Communication
Authors: Yuankun Zeng, Shaohui Li, Zhi Li, Shulan Ruan, Yu Liu, You He,
Abstract summary: Collaborative perception improves 3D understanding by fusing multi-agent observations, yet intermediate-feature sharing faces strict bandwidth constraints.<n>We propose V2X-DSC, a framework with a Conditional Codec (DCC) for bandwidth-constrained fusion.<n> Experiments on DAIR-V2X, OPV2V, and V2X-Real demonstrate state-of-the-art accuracy-bandwidth trade-offs under KB-level communication.
Score: 25.092575199683747
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Collaborative perception improves 3D understanding by fusing multi-agent observations, yet intermediate-feature sharing faces strict bandwidth constraints as dense BEV features saturate V2X links. We observe that collaborators view the same physical world, making their features strongly correlated; thus receivers only need innovation beyond their local context. Revisiting this from a distributed source coding perspective, we propose V2X-DSC, a framework with a Conditional Codec (DCC) for bandwidth-constrained fusion. The sender compresses BEV features into compact codes, while the receiver performs conditional reconstruction using its local features as side information, allocating bits to complementary cues rather than redundant content. This conditional structure regularizes learning, encouraging incremental representation and yielding lower-noise features. Experiments on DAIR-V2X, OPV2V, and V2X-Real demonstrate state-of-the-art accuracy-bandwidth trade-offs under KB-level communication, and generalizes as a plug-and-play communication layer across multiple fusion backbones.

Related papers

Q-KVComm: Efficient Multi-Agent Communication Via Adaptive KV Cache Compression [0.0]
We introduce Q-KVComm, a new protocol that enables direct transmission of compressed key-value (KV) cache representations between agents.<n>Q-KVComm achieves 5-6x compression ratios while maintaining semantic fidelity, with coherence quality scores above 0.77 across all scenarios.<n>Our work establishes a new paradigm for LLM agent communication, shifting from text-based to representation-based information exchange.
arXiv Detail & Related papers (2025-11-27T10:45:41Z)
INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception [6.018757656052237]
Collaborative perception systems overcome single-vehicle limitations by integrating multi-agent sensory data, improving accuracy and safety.<n>Previous works proves that query-based instance-level interaction reduces bandwidth demands and manual priors, however, LiDAR-focused implementations in collaborative perception remain underdeveloped.<n>We propose INSTINCT, a novel collaborative perception framework featuring three core components: 1) a quality-aware filtering mechanism for high-quality instance feature selection; 2) a dual-branch detection routing scheme to decouple collaboration-irrelevant and collaboration-relevant instances; and 3) a Cross Agent Local Instance Fusion module to aggregate local hybrid instance features.
arXiv Detail & Related papers (2025-09-28T07:16:32Z)
Localizing Audio-Visual Deepfakes via Hierarchical Boundary Modeling [50.8215545241128]
We propose a.<n> Boundary Modeling Network (HBMNet), which includes three modules: an Audio-Visual Feature, a.<n> Coarse Proposal Generator and a Fine-Hierarchical Probabilities Generator.<n>From the modality perspective, we enhance audio-visual encoding and fusion, reinforced by frame-level supervision.<n>Experiments show that encoding and fusion primarily improve precision, while frame-level supervision recall.
arXiv Detail & Related papers (2025-08-04T02:41:09Z)
V2X-DGPE: Addressing Domain Gaps and Pose Errors for Robust Collaborative 3D Object Detection [18.694510415777632]
V2X-DGPE is a high-accuracy and robust V2X feature-level collaborative perception framework.<n>The proposed method outperforms existing approaches, achieving state-of-the-art detection performance.
arXiv Detail & Related papers (2025-01-04T19:28:55Z)
CooPre: Cooperative Pretraining for V2X Cooperative Perception [47.00472259100765]
CooPre is a self-supervised learning framwork for V2X cooperative perception.<n>We develop a V2X bird-eye-view (BEV) guided masking strategy which effectively allows the model to pay attention to 3D features across heterogeneous V2X agents.<n>CooPre achieves a 4% mAP improvement on V2X-Real dataset and surpasses baseline performance using only 50% of the training data.
arXiv Detail & Related papers (2024-08-20T23:39:26Z)
Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints [66.63250537475973]
This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model.<n>Our experimental results demonstrate significant improvements in pixel-level metrics like peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS)
arXiv Detail & Related papers (2024-07-26T02:34:25Z)
Communication-Efficient Collaborative Perception via Information Filling with Codebook [48.087934650038044]
Collaborative perception empowers each agent to improve its perceptual ability through the exchange of perceptual messages with other agents. To address this bottleneck issue, our core idea is to optimize the collaborative messages from two key aspects: representation and selection. By integrating these two designs, we propose CodeFilling, a novel communication-efficient collaborative perception system.
arXiv Detail & Related papers (2024-05-08T11:12:37Z)
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents. V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention. To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z)
Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS) Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage. We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.