CooPre: Cooperative Pretraining for V2X Cooperative Perception
- URL: http://arxiv.org/abs/2408.11241v1
- Date: Tue, 20 Aug 2024 23:39:26 GMT
- Title: CooPre: Cooperative Pretraining for V2X Cooperative Perception
- Authors: Seth Z. Zhao, Hao Xiang, Chenfeng Xu, Xin Xia, Bolei Zhou, Jiaqi Ma,
- Abstract summary: We present a self-supervised learning method for V2X cooperative perception.
We utilize the vast amount of unlabeled 3D V2X data to enhance the perception performance.
- Score: 47.00472259100765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing Vehicle-to-Everything (V2X) cooperative perception methods rely on accurate multi-agent 3D annotations. Nevertheless, it is time-consuming and expensive to collect and annotate real-world data, especially for V2X systems. In this paper, we present a self-supervised learning method for V2X cooperative perception, which utilizes the vast amount of unlabeled 3D V2X data to enhance the perception performance. Beyond simply extending the previous pre-training methods for point-cloud representation learning, we introduce a novel self-supervised Cooperative Pretraining framework (termed as CooPre) customized for a collaborative scenario. We point out that cooperative point-cloud sensing compensates for information loss among agents. This motivates us to design a novel proxy task for the 3D encoder to reconstruct LiDAR point clouds across different agents. Besides, we develop a V2X bird-eye-view (BEV) guided masking strategy which effectively allows the model to pay attention to 3D features across heterogeneous V2X agents (i.e., vehicles and infrastructure) in the BEV space. Noticeably, such a masking strategy effectively pretrains the 3D encoder and is compatible with mainstream cooperative perception backbones. Our approach, validated through extensive experiments on representative datasets (i.e., V2X-Real, V2V4Real, and OPV2V), leads to a performance boost across all V2X settings. Additionally, we demonstrate the framework's improvements in cross-domain transferability, data efficiency, and robustness under challenging scenarios. The code will be made publicly available.
Related papers
- LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training [61.26381389532653]
LiOn-XA is an unsupervised domain adaptation (UDA) approach that combines LiDAR-Only Cross-Modal (X) learning with Adversarial training for 3D LiDAR point cloud semantic segmentation.
Our experiments on 3 real-to-real adaptation scenarios demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T09:50:17Z) - UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection [11.60579201022641]
We propose a framework specifically designed for aerial-ground collaboration.
We develop a virtual dataset named V2U-COO for our research.
Second, we design a Cross-Domain Cross-Adaptation (CDCA) module to align the target information.
Third, we introduce a Collaborative Depth Optimization (CDO) module to obtain more precise depth estimation results.
arXiv Detail & Related papers (2024-06-07T05:25:45Z) - End-to-End Autonomous Driving through V2X Cooperation [23.44597411612664]
We introduce UniV2X, a pioneering cooperative autonomous driving framework.
UniV2X seamlessly integrates all key driving modules across diverse views into a unified network.
arXiv Detail & Related papers (2024-03-31T15:22:11Z) - DI-V2X: Learning Domain-Invariant Representation for
Vehicle-Infrastructure Collaborative 3D Object Detection [78.09431523221458]
DI-V2X aims to learn Domain-Invariant representations through a new distillation framework.
DI-V2X comprises three essential components: a domain-mixing instance augmentation (DMA) module, a progressive domain-invariant distillation (PDD) module, and a domain-adaptive fusion (DAF) module.
arXiv Detail & Related papers (2023-12-25T14:40:46Z) - Learning Cooperative Trajectory Representations for Motion Forecasting [4.380073528690906]
We propose a forecasting-oriented representation paradigm to utilize motion and interaction features from cooperative information.
We present V2X-Graph, a representative framework to achieve interpretable and end-to-end trajectory feature fusion for cooperative motion forecasting.
To further evaluate on vehicle-to-everything (V2X) scenario, we construct the first real-world V2X motion forecasting dataset V2X-Traj.
arXiv Detail & Related papers (2023-11-01T08:53:05Z) - FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels [57.05834683261658]
We present FSDv2, an evolution that aims to simplify the previous FSDv1 while eliminating the inductive bias introduced by its handcrafted instance-level representation.
We develop a suite of components to complement the virtual voxel concept, including a virtual voxel encoder, a virtual voxel mixer, and a virtual voxel assignment strategy.
arXiv Detail & Related papers (2023-08-07T17:59:48Z) - V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle
Cooperative Perception [49.7212681947463]
Vehicle-to-Vehicle (V2V) cooperative perception system has great potential to revolutionize the autonomous driving industry.
We present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception.
Our dataset covers a driving area of 410 km, comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps.
arXiv Detail & Related papers (2023-03-14T02:49:20Z) - CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse
Transformers [36.838065731893735]
CoBEVT is the first generic multi-agent perception framework that can cooperatively generate BEV map predictions.
CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation.
arXiv Detail & Related papers (2022-07-05T17:59:28Z) - V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision
Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents.
V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention.
To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.