Related papers: CooPre: Cooperative Pretraining for V2X Cooperative Perception

CooPre: Cooperative Pretraining for V2X Cooperative Perception

URL: http://arxiv.org/abs/2408.11241v2
Date: Tue, 17 Jun 2025 23:39:16 GMT
Title: CooPre: Cooperative Pretraining for V2X Cooperative Perception
Authors: Seth Z. Zhao, Hao Xiang, Chenfeng Xu, Xin Xia, Bolei Zhou, Jiaqi Ma,
Abstract summary: CooPre is a self-supervised learning framwork for V2X cooperative perception.<n>We develop a V2X bird-eye-view (BEV) guided masking strategy which effectively allows the model to pay attention to 3D features across heterogeneous V2X agents.<n>CooPre achieves a 4% mAP improvement on V2X-Real dataset and surpasses baseline performance using only 50% of the training data.
Score: 47.00472259100765
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing Vehicle-to-Everything (V2X) cooperative perception methods rely on accurate multi-agent 3D annotations. Nevertheless, it is time-consuming and expensive to collect and annotate real-world data, especially for V2X systems. In this paper, we present a self-supervised learning framwork for V2X cooperative perception, which utilizes the vast amount of unlabeled 3D V2X data to enhance the perception performance. Specifically, multi-agent sensing information is aggregated to form a holistic view and a novel proxy task is formulated to reconstruct the LiDAR point clouds across multiple connected agents to better reason multi-agent spatial correlations. Besides, we develop a V2X bird-eye-view (BEV) guided masking strategy which effectively allows the model to pay attention to 3D features across heterogeneous V2X agents (i.e., vehicles and infrastructure) in the BEV space. Noticeably, such a masking strategy effectively pretrains the 3D encoder with a multi-agent LiDAR point cloud reconstruction objective and is compatible with mainstream cooperative perception backbones. Our approach, validated through extensive experiments on representative datasets (i.e., V2X-Real, V2V4Real, and OPV2V) and multiple state-of-the-art cooperative perception methods (i.e., AttFuse, F-Cooper, and V2X-ViT), leads to a performance boost across all V2X settings. Notably, CooPre achieves a 4% mAP improvement on V2X-Real dataset and surpasses baseline performance using only 50% of the training data, highlighting its data efficiency. Additionally, we demonstrate the framework's powerful performance in cross-domain transferability and robustness under challenging scenarios. The code will be made publicly available at https://github.com/ucla-mobility/CooPre.

Related papers

One RL to See Them All: Visual Triple Unified Reinforcement Learning [92.90120580989839]
We propose V-Triune, a Visual Triple Unified Reinforcement Learning system that enables visual reasoning and perception tasks within a single training pipeline.<n>V-Triune comprises triple complementary components: Sample-Level Datashelf (to unify diverse task inputs), Verifier-Level Reward (to deliver custom rewards via specialized verifiers).<n>We introduce a novel Dynamic IoU reward, which provides adaptive, progressive, and definite feedback for perception tasks handled by V-Triune.
arXiv Detail & Related papers (2025-05-23T17:41:14Z)
V2X-DG: Domain Generalization for Vehicle-to-Everything Cooperative Perception [34.97091536254836]
This paper is the first work to study the Domain Generalization problem of LiDAR-based V2X cooperative perception. Our research seeks to sustain high performance not only within the source domain but also across other unseen domains.
arXiv Detail & Related papers (2025-03-19T17:17:44Z)
SparseAlign: A Fully Sparse Framework for Cooperative Object Detection [38.96043178218958]
We design a fully sparse framework, SparseAlign, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features. Our framework, despite its sparsity, outperforms the state of the art with less communication bandwidth requirements.
arXiv Detail & Related papers (2025-03-17T09:38:53Z)
LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training [61.26381389532653]
LiOn-XA is an unsupervised domain adaptation (UDA) approach that combines LiDAR-Only Cross-Modal (X) learning with Adversarial training for 3D LiDAR point cloud semantic segmentation. Our experiments on 3 real-to-real adaptation scenarios demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T09:50:17Z)
Multi-V2X: A Large Scale Multi-modal Multi-penetration-rate Dataset for Cooperative Perception [3.10770247120758]
We introduce Multi-V2X, a large-scale, multi-modal, multi-penetration-rate dataset for V2X perception. In total, our Multi-V2X dataset comprises 549k RGB frames, 146k LiDAR frames, and 4,219k annotated 3D bounding boxes. The highest possible CAV penetration rate reaches 86.21%, with up to 31 agents in communication range.
arXiv Detail & Related papers (2024-09-08T05:22:00Z)
UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection [11.60579201022641]
We propose a framework specifically designed for aerial-ground collaboration. We develop a virtual dataset named V2U-COO for our research. Second, we design a Cross-Domain Cross-Adaptation (CDCA) module to align the target information. Third, we introduce a Collaborative Depth Optimization (CDO) module to obtain more precise depth estimation results.
arXiv Detail & Related papers (2024-06-07T05:25:45Z)
End-to-End Autonomous Driving through V2X Cooperation [23.44597411612664]
We introduce UniV2X, a pioneering cooperative autonomous driving framework. UniV2X seamlessly integrates all key driving modules across diverse views into a unified network.
arXiv Detail & Related papers (2024-03-31T15:22:11Z)
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection [78.09431523221458]
DI-V2X aims to learn Domain-Invariant representations through a new distillation framework. DI-V2X comprises three essential components: a domain-mixing instance augmentation (DMA) module, a progressive domain-invariant distillation (PDD) module, and a domain-adaptive fusion (DAF) module.
arXiv Detail & Related papers (2023-12-25T14:40:46Z)
Learning Cooperative Trajectory Representations for Motion Forecasting [4.380073528690906]
We propose a forecasting-oriented representation paradigm to utilize motion and interaction features from cooperative information. We present V2X-Graph, a representative framework to achieve interpretable and end-to-end trajectory feature fusion for cooperative motion forecasting. To further evaluate on vehicle-to-everything (V2X) scenario, we construct the first real-world V2X motion forecasting dataset V2X-Traj.
arXiv Detail & Related papers (2023-11-01T08:53:05Z)
FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels [57.05834683261658]
We present FSDv2, an evolution that aims to simplify the previous FSDv1 while eliminating the inductive bias introduced by its handcrafted instance-level representation. We develop a suite of components to complement the virtual voxel concept, including a virtual voxel encoder, a virtual voxel mixer, and a virtual voxel assignment strategy.
arXiv Detail & Related papers (2023-08-07T17:59:48Z)
V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception [49.7212681947463]
Vehicle-to-Vehicle (V2V) cooperative perception system has great potential to revolutionize the autonomous driving industry. We present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception. Our dataset covers a driving area of 410 km, comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps.
arXiv Detail & Related papers (2023-03-14T02:49:20Z)
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers [36.838065731893735]
CoBEVT is the first generic multi-agent perception framework that can cooperatively generate BEV map predictions. CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation.
arXiv Detail & Related papers (2022-07-05T17:59:28Z)
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents. V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention. To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.