Related papers: EffiComm: Bandwidth Efficient Multi Agent Communication

EffiComm: Bandwidth Efficient Multi Agent Communication

URL: http://arxiv.org/abs/2507.19354v1
Date: Fri, 25 Jul 2025 15:03:26 GMT
Title: EffiComm: Bandwidth Efficient Multi Agent Communication
Authors: Melih Yazgan, Allen Xavier Arasan, J. Marius Zöllner,
Abstract summary: Collaborative perception allows connected vehicles to exchange sensor information and overcome each vehicle's blind spots.<n>We introduce EffiComm, an end-to-end framework that transmits less than 40% of the data required by prior art while maintaining state-of-the-art 3D object detection accuracy.
Score: 11.311414617703308
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Collaborative perception allows connected vehicles to exchange sensor information and overcome each vehicle's blind spots. Yet transmitting raw point clouds or full feature maps overwhelms Vehicle-to-Vehicle (V2V) communications, causing latency and scalability problems. We introduce EffiComm, an end-to-end framework that transmits less than 40% of the data required by prior art while maintaining state-of-the-art 3D object detection accuracy. EffiComm operates on Bird's-Eye-View (BEV) feature maps from any modality and applies a two-stage reduction pipeline: (1) Selective Transmission (ST) prunes low-utility regions with a confidence mask; (2) Adaptive Grid Reduction (AGR) uses a Graph Neural Network (GNN) to assign vehicle-specific keep ratios according to role and network load. The remaining features are fused with a soft-gated Mixture-of-Experts (MoE) attention layer, offering greater capacity and specialization for effective feature integration. On the OPV2V benchmark, EffiComm reaches 0.84 mAP@0.7 while sending only an average of approximately 1.5 MB per frame, outperforming previous methods on the accuracy-per-bit curve. These results highlight the value of adaptive, learned communication for scalable Vehicle-to-Everything (V2X) perception.

Related papers

Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification [80.83325513157637]
Few-Shot Remote Sensing Scene Classification (FS-RSSC) presents the challenge of classifying remote sensing images with limited labeled samples.<n>We propose a novel Optimal Transport Adapter Tuning (OTAT) framework aimed at constructing an ideal Platonic representational space.
arXiv Detail & Related papers (2025-03-19T07:04:24Z)
CoCMT: Communication-Efficient Cross-Modal Transformer for Collaborative Perception [14.619784179608361]
Multi-agent collaborative perception enhances each agent's capabilities by sharing sensing information to cooperatively perform robot perception tasks.<n>Existing representative collaborative perception systems transmit intermediate feature maps, which contain significant amount of non-critical information.<n>We introduce CoCMT, an object-query-based collaboration framework that maximizes communication bandwidth by selectively extracting and transmitting essential features.
arXiv Detail & Related papers (2025-03-13T06:41:25Z)
LCV2I: Communication-Efficient and High-Performance Collaborative Perception Framework with Low-Resolution LiDAR [19.748419057261106]
Vehicle-to-Infrastructure (V2I) collaborative perception leverages data collected by infrastructure's sensors to enhance vehicle perceptual capabilities.<n>Lidar as a commonly used sensor in cooperative perception, is widely equipped in intelligent vehicles and infrastructure.<n>To achieve low-cost V2I, reducing the cost of LiDAR is crucial.
arXiv Detail & Related papers (2025-02-24T10:46:28Z)
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism. We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies. By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z)
LiDAR-based End-to-end Temporal Perception for Vehicle-Infrastructure Cooperation [16.465037559349323]
We introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for Vehicle-Infrastructure Cooperation.<n>We employ Temporal Self-Attention and VIC Cross-Attention modules to integrate temporal and spatial information from both vehicle and infrastructure perspectives.<n>Experiments demonstrate that the integration of multi-view perspectives, temporal sequences, or CEC in end-to-end training significantly improves both detection and tracking performance.
arXiv Detail & Related papers (2024-11-22T13:34:29Z)
Channel-Aware Throughput Maximization for Cooperative Data Fusion in CAV [17.703608985129026]
Connected and autonomous vehicles (CAVs) have garnered significant attention due to their extended perception range and enhanced sensing coverage.<n>To address challenges such as blind spots and obstructions, CAVs employ vehicle-to-vehicle communications to aggregate data from surrounding vehicles.<n>We propose a channel-aware throughput approach to facilitate CAV data fusion, leveraging a self-supervised autoencoder for adaptive data compression.
arXiv Detail & Related papers (2024-10-06T00:43:46Z)
VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness [56.87603097348203]
VeCAF uses labels and natural language annotations to perform parametric data selection for PVM finetuning. VeCAF incorporates the finetuning objective to select significant data points that effectively guide the PVM towards faster convergence. On ImageNet, VeCAF uses up to 3.3x less training batches to reach the target performance compared to full finetuning.
arXiv Detail & Related papers (2024-01-15T17:28:37Z)
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition [63.93802691275012]
We propose a lightweight Dual Dynamic Token Mixer (D-Mixer) to simultaneously learn global and local dynamics.<n>We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN-Transformer vision backbone network.<n>In the ImageNet-1K classification, TransXNet-T surpasses Swin-T by 0.3% in top-1 accuracy while requiring less than half of the computational cost.
arXiv Detail & Related papers (2023-10-30T09:35:56Z)
Correlating sparse sensing for large-scale traffic speed estimation: A Laplacian-enhanced low-rank tensor kriging approach [76.45949280328838]
We propose a Laplacian enhanced low-rank tensor (LETC) framework featuring both lowrankness and multi-temporal correlations for large-scale traffic speed kriging. We then design an efficient solution algorithm via several effective numeric techniques to scale up the proposed model to network-wide kriging.
arXiv Detail & Related papers (2022-10-21T07:25:57Z)
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents. V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention. To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z)
Keypoints-Based Deep Feature Fusion for Cooperative Vehicle Detection of Autonomous Driving [2.6543018470131283]
We propose an efficient keypoints-based deep feature fusion framework, called FPV-RCNN, for collective perception. Compared to a bird's-eye view (BEV) keypoints feature fusion, FPV-RCNN achieves improved detection accuracy by about 14%. Our method also significantly decreases the CPM size to less than 0.3KB, which is about 50 times smaller than the BEV feature map sharing used in previous works.
arXiv Detail & Related papers (2021-09-23T19:41:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.