CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View
Fusion
- URL: http://arxiv.org/abs/2310.06008v1
- Date: Mon, 9 Oct 2023 17:52:26 GMT
- Title: CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View
Fusion
- Authors: Donghao Qiao and Farhana Zulkernine
- Abstract summary: Recent approaches in cooperative perception only share single sensor information such as cameras or LiDAR.
We present a framework, called CoBEVFusion, that fuses LiDAR and camera data to create a Bird's-Eye View (BEV) representation.
Our framework was evaluated on the cooperative perception dataset OPV2V for two perception tasks: BEV semantic segmentation and 3D object detection.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous Vehicles (AVs) use multiple sensors to gather information about
their surroundings. By sharing sensor data between Connected Autonomous
Vehicles (CAVs), the safety and reliability of these vehicles can be improved
through a concept known as cooperative perception. However, recent approaches
in cooperative perception only share single sensor information such as cameras
or LiDAR. In this research, we explore the fusion of multiple sensor data
sources and present a framework, called CoBEVFusion, that fuses LiDAR and
camera data to create a Bird's-Eye View (BEV) representation. The CAVs process
the multi-modal data locally and utilize a Dual Window-based Cross-Attention
(DWCA) module to fuse the LiDAR and camera features into a unified BEV
representation. The fused BEV feature maps are shared among the CAVs, and a 3D
Convolutional Neural Network is applied to aggregate the features from the
CAVs. Our CoBEVFusion framework was evaluated on the cooperative perception
dataset OPV2V for two perception tasks: BEV semantic segmentation and 3D object
detection. The results show that our DWCA LiDAR-camera fusion model outperforms
perception models with single-modal data and state-of-the-art BEV fusion
models. Our overall cooperative perception architecture, CoBEVFusion, also
achieves comparable performance with other cooperative perception models.
Related papers
- SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles [18.23919432049492]
Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles.
This drawback impedes the adoption of cooperative perception as vehicle resources are often insufficient to concurrently employ two perception models.
We present Simultaneous Individual and Cooperative Perception (SiCP), a generic framework that supports a wide range of the state-of-the-art standalone perception backbones.
arXiv Detail & Related papers (2023-12-08T04:12:26Z) - Towards Viewpoint Robustness in Bird's Eye View Segmentation [85.99907496019972]
We study how AV perception models are affected by changes in camera viewpoint.
Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance.
We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
arXiv Detail & Related papers (2023-09-11T02:10:07Z) - HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with
vision transformer [4.957079586254435]
HM-ViT is the first unified multi-agent hetero-modal cooperative perception framework.
It can collaboratively predict 3D objects for highly dynamic vehicle-to-vehicle (V2V) collaborations with varying numbers and types of agents.
arXiv Detail & Related papers (2023-04-20T20:09:59Z) - V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle
Cooperative Perception [49.7212681947463]
Vehicle-to-Vehicle (V2V) cooperative perception system has great potential to revolutionize the autonomous driving industry.
We present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception.
Our dataset covers a driving area of 410 km, comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps.
arXiv Detail & Related papers (2023-03-14T02:49:20Z) - Adaptive Feature Fusion for Cooperative Perception using LiDAR Point
Clouds [0.0]
Cooperative perception allows a Connected Autonomous Vehicle to interact with the other CAVs in the vicinity.
It can compensate for the limitations of the conventional vehicular perception such as blind spots, low resolution, and weather effects.
We evaluate the performance of cooperative perception for both vehicle and pedestrian detection using the CODD dataset.
arXiv Detail & Related papers (2022-07-30T01:53:05Z) - CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse
Transformers [36.838065731893735]
CoBEVT is the first generic multi-agent perception framework that can cooperatively generate BEV map predictions.
CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation.
arXiv Detail & Related papers (2022-07-05T17:59:28Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z) - V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision
Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents.
V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention.
To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z) - Keypoints-Based Deep Feature Fusion for Cooperative Vehicle Detection of
Autonomous Driving [2.6543018470131283]
We propose an efficient keypoints-based deep feature fusion framework, called FPV-RCNN, for collective perception.
Compared to a bird's-eye view (BEV) keypoints feature fusion, FPV-RCNN achieves improved detection accuracy by about 14%.
Our method also significantly decreases the CPM size to less than 0.3KB, which is about 50 times smaller than the BEV feature map sharing used in previous works.
arXiv Detail & Related papers (2021-09-23T19:41:02Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.