SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles
- URL: http://arxiv.org/abs/2312.04822v2
- Date: Tue, 27 Aug 2024 03:33:51 GMT
- Title: SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles
- Authors: Deyuan Qu, Qi Chen, Tianyu Bai, Hongsheng Lu, Heng Fan, Hao Zhang, Song Fu, Qing Yang,
- Abstract summary: Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles.
This drawback impedes the adoption of cooperative perception as vehicle resources are often insufficient to concurrently employ two perception models.
We present Simultaneous Individual and Cooperative Perception (SiCP), a generic framework that supports a wide range of the state-of-the-art standalone perception backbones.
- Score: 18.23919432049492
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles. However, the absence of feature maps shared from other vehicles can lead to a significant decline in 3D object detection performance for cooperative perception models compared to standalone 3D detection models. This drawback impedes the adoption of cooperative perception as vehicle resources are often insufficient to concurrently employ two perception models. To tackle this issue, we present Simultaneous Individual and Cooperative Perception (SiCP), a generic framework that supports a wide range of the state-of-the-art standalone perception backbones and enhances them with a novel Dual-Perception Network (DP-Net) designed to facilitate both individual and cooperative perception. In addition to its lightweight nature with only 0.13M parameters, DP-Net is robust and retains crucial gradient information during feature map fusion. As demonstrated in a comprehensive evaluation on the V2V4Real and OPV2V datasets, thanks to DP-Net, SiCP surpasses state-of-the-art cooperative perception solutions while preserving the performance of standalone perception solutions.
Related papers
- CoMamba: Real-time Cooperative Perception Unlocked with State Space Models [39.87600356189242]
CoMamba is a novel cooperative 3D detection framework designed to leverage state-space models for real-time onboard vehicle perception.
CoMamba achieves superior performance compared to existing methods while maintaining real-time processing capabilities.
arXiv Detail & Related papers (2024-09-16T20:02:19Z) - V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric
Heterogenous Distillation Network [13.248981195106069]
We propose a multi-view vehicle-road cooperation perception system, vehicle-to-everything cooperative perception (V2X-AHD)
The V2X-AHD can effectively improve the accuracy of 3D object detection and reduce the number of network parameters, according to this study.
arXiv Detail & Related papers (2023-10-10T13:12:03Z) - CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View
Fusion [0.0]
Recent approaches in cooperative perception only share single sensor information such as cameras or LiDAR.
We present a framework, called CoBEVFusion, that fuses LiDAR and camera data to create a Bird's-Eye View (BEV) representation.
Our framework was evaluated on the cooperative perception dataset OPV2V for two perception tasks: BEV semantic segmentation and 3D object detection.
arXiv Detail & Related papers (2023-10-09T17:52:26Z) - UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving [47.590099762244535]
Masked Autoencoders (MAE) play a pivotal role in learning potent representations, delivering outstanding results across various 3D perception tasks.
This research delves into multi-modal Masked Autoencoders tailored for a unified representation space in autonomous driving.
To intricately marry the semantics inherent in images with the geometric intricacies of LiDAR point clouds, we propose UniM$2$AE.
arXiv Detail & Related papers (2023-08-21T02:13:40Z) - UniTR: A Unified and Efficient Multi-Modal Transformer for
Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR.
UniTR processes a variety of modalities with unified modeling and shared parameters.
UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z) - DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and
Interconnected Self-driving [19.66714697653504]
Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving.
The lack of datasets has severely blocked the development of collaborative perception algorithms.
We release DOLPHINS: dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving.
arXiv Detail & Related papers (2022-07-15T17:07:07Z) - Collaborative 3D Object Detection for Automatic Vehicle Systems via
Learnable Communications [8.633120731620307]
We propose a novel collaborative 3D object detection framework that consists of three components.
Experiment results and bandwidth usage analysis demonstrate that our approach can save communication and computation costs.
arXiv Detail & Related papers (2022-05-24T07:17:32Z) - V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision
Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents.
V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention.
To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z) - Similarity-Aware Fusion Network for 3D Semantic Segmentation [87.51314162700315]
We propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation.
We employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds.
We show that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.
arXiv Detail & Related papers (2021-07-04T09:28:18Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.