CooPre: Cooperative Pretraining for V2X Cooperative Perception
- URL: http://arxiv.org/abs/2408.11241v2
- Date: Tue, 17 Jun 2025 23:39:16 GMT
- Title: CooPre: Cooperative Pretraining for V2X Cooperative Perception
- Authors: Seth Z. Zhao, Hao Xiang, Chenfeng Xu, Xin Xia, Bolei Zhou, Jiaqi Ma,
- Abstract summary: CooPre is a self-supervised learning framwork for V2X cooperative perception.<n>We develop a V2X bird-eye-view (BEV) guided masking strategy which effectively allows the model to pay attention to 3D features across heterogeneous V2X agents.<n>CooPre achieves a 4% mAP improvement on V2X-Real dataset and surpasses baseline performance using only 50% of the training data.
- Score: 47.00472259100765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing Vehicle-to-Everything (V2X) cooperative perception methods rely on accurate multi-agent 3D annotations. Nevertheless, it is time-consuming and expensive to collect and annotate real-world data, especially for V2X systems. In this paper, we present a self-supervised learning framwork for V2X cooperative perception, which utilizes the vast amount of unlabeled 3D V2X data to enhance the perception performance. Specifically, multi-agent sensing information is aggregated to form a holistic view and a novel proxy task is formulated to reconstruct the LiDAR point clouds across multiple connected agents to better reason multi-agent spatial correlations. Besides, we develop a V2X bird-eye-view (BEV) guided masking strategy which effectively allows the model to pay attention to 3D features across heterogeneous V2X agents (i.e., vehicles and infrastructure) in the BEV space. Noticeably, such a masking strategy effectively pretrains the 3D encoder with a multi-agent LiDAR point cloud reconstruction objective and is compatible with mainstream cooperative perception backbones. Our approach, validated through extensive experiments on representative datasets (i.e., V2X-Real, V2V4Real, and OPV2V) and multiple state-of-the-art cooperative perception methods (i.e., AttFuse, F-Cooper, and V2X-ViT), leads to a performance boost across all V2X settings. Notably, CooPre achieves a 4% mAP improvement on V2X-Real dataset and surpasses baseline performance using only 50% of the training data, highlighting its data efficiency. Additionally, we demonstrate the framework's powerful performance in cross-domain transferability and robustness under challenging scenarios. The code will be made publicly available at https://github.com/ucla-mobility/CooPre.
Related papers
- One RL to See Them All: Visual Triple Unified Reinforcement Learning [92.90120580989839]
We propose V-Triune, a Visual Triple Unified Reinforcement Learning system that enables visual reasoning and perception tasks within a single training pipeline.<n>V-Triune comprises triple complementary components: Sample-Level Datashelf (to unify diverse task inputs), Verifier-Level Reward (to deliver custom rewards via specialized verifiers).<n>We introduce a novel Dynamic IoU reward, which provides adaptive, progressive, and definite feedback for perception tasks handled by V-Triune.
arXiv Detail & Related papers (2025-05-23T17:41:14Z) - V2X-DG: Domain Generalization for Vehicle-to-Everything Cooperative Perception [34.97091536254836]
This paper is the first work to study the Domain Generalization problem of LiDAR-based V2X cooperative perception.
Our research seeks to sustain high performance not only within the source domain but also across other unseen domains.
arXiv Detail & Related papers (2025-03-19T17:17:44Z) - SparseAlign: A Fully Sparse Framework for Cooperative Object Detection [38.96043178218958]
We design a fully sparse framework, SparseAlign, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features.
Our framework, despite its sparsity, outperforms the state of the art with less communication bandwidth requirements.
arXiv Detail & Related papers (2025-03-17T09:38:53Z) - LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training [61.26381389532653]
LiOn-XA is an unsupervised domain adaptation (UDA) approach that combines LiDAR-Only Cross-Modal (X) learning with Adversarial training for 3D LiDAR point cloud semantic segmentation.
Our experiments on 3 real-to-real adaptation scenarios demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T09:50:17Z) - Multi-V2X: A Large Scale Multi-modal Multi-penetration-rate Dataset for Cooperative Perception [3.10770247120758]
We introduce Multi-V2X, a large-scale, multi-modal, multi-penetration-rate dataset for V2X perception.
In total, our Multi-V2X dataset comprises 549k RGB frames, 146k LiDAR frames, and 4,219k annotated 3D bounding boxes.
The highest possible CAV penetration rate reaches 86.21%, with up to 31 agents in communication range.
arXiv Detail & Related papers (2024-09-08T05:22:00Z) - UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection [11.60579201022641]
We propose a framework specifically designed for aerial-ground collaboration.
We develop a virtual dataset named V2U-COO for our research.
Second, we design a Cross-Domain Cross-Adaptation (CDCA) module to align the target information.
Third, we introduce a Collaborative Depth Optimization (CDO) module to obtain more precise depth estimation results.
arXiv Detail & Related papers (2024-06-07T05:25:45Z) - End-to-End Autonomous Driving through V2X Cooperation [23.44597411612664]
We introduce UniV2X, a pioneering cooperative autonomous driving framework.
UniV2X seamlessly integrates all key driving modules across diverse views into a unified network.
arXiv Detail & Related papers (2024-03-31T15:22:11Z) - DI-V2X: Learning Domain-Invariant Representation for
Vehicle-Infrastructure Collaborative 3D Object Detection [78.09431523221458]
DI-V2X aims to learn Domain-Invariant representations through a new distillation framework.
DI-V2X comprises three essential components: a domain-mixing instance augmentation (DMA) module, a progressive domain-invariant distillation (PDD) module, and a domain-adaptive fusion (DAF) module.
arXiv Detail & Related papers (2023-12-25T14:40:46Z) - Learning Cooperative Trajectory Representations for Motion Forecasting [4.380073528690906]
We propose a forecasting-oriented representation paradigm to utilize motion and interaction features from cooperative information.
We present V2X-Graph, a representative framework to achieve interpretable and end-to-end trajectory feature fusion for cooperative motion forecasting.
To further evaluate on vehicle-to-everything (V2X) scenario, we construct the first real-world V2X motion forecasting dataset V2X-Traj.
arXiv Detail & Related papers (2023-11-01T08:53:05Z) - FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels [57.05834683261658]
We present FSDv2, an evolution that aims to simplify the previous FSDv1 while eliminating the inductive bias introduced by its handcrafted instance-level representation.
We develop a suite of components to complement the virtual voxel concept, including a virtual voxel encoder, a virtual voxel mixer, and a virtual voxel assignment strategy.
arXiv Detail & Related papers (2023-08-07T17:59:48Z) - V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle
Cooperative Perception [49.7212681947463]
Vehicle-to-Vehicle (V2V) cooperative perception system has great potential to revolutionize the autonomous driving industry.
We present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception.
Our dataset covers a driving area of 410 km, comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps.
arXiv Detail & Related papers (2023-03-14T02:49:20Z) - CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse
Transformers [36.838065731893735]
CoBEVT is the first generic multi-agent perception framework that can cooperatively generate BEV map predictions.
CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation.
arXiv Detail & Related papers (2022-07-05T17:59:28Z) - V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision
Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents.
V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention.
To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.