VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion
- URL: http://arxiv.org/abs/2111.14382v2
- Date: Wed, 1 Dec 2021 14:24:15 GMT
- Title: VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion
- Authors: Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Qiuyu Mao, Houqiang Li,
Yanyong Zhang
- Abstract summary: VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
- Score: 62.24001258298076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been well recognized that fusing the complementary information from
depth-aware LiDAR point clouds and semantic-rich stereo images would benefit 3D
object detection. Nevertheless, it is not trivial to explore the inherently
unnatural interaction between sparse 3D points and dense 2D pixels. To ease
this difficulty, the recent proposals generally project the 3D points onto the
2D image plane to sample the image data and then aggregate the data at the
points. However, this approach often suffers from the mismatch between the
resolution of point clouds and RGB images, leading to sub-optimal performance.
Specifically, taking the sparse points as the multi-modal data aggregation
locations causes severe information loss for high-resolution images, which in
turn undermines the effectiveness of multi-sensor fusion. In this paper, we
present VPFNet -- a new architecture that cleverly aligns and aggregates the
point cloud and image data at the `virtual' points. Particularly, with their
density lying between that of the 3D points and 2D pixels, the virtual points
can nicely bridge the resolution gap between the two sensors, and thus preserve
more information for processing. Moreover, we also investigate the data
augmentation techniques that can be applied to both point clouds and RGB
images, as the data augmentation has made non-negligible contribution towards
3D object detectors to date. We have conducted extensive experiments on KITTI
dataset, and have observed good performance compared to the state-of-the-art
methods. Remarkably, our VPFNet achieves 83.21\% moderate 3D AP and 91.86\%
moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
The network design also takes computation efficiency into consideration -- we
can achieve a FPS of 15 on a single NVIDIA RTX 2080Ti GPU. The code will be
made available for reproduction and further investigation.
Related papers
- Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection [11.575945934519442]
LiDAR and camera fusion techniques are promising for achieving 3D object detection in autonomous driving.
Most multi-modal 3D object detection frameworks integrate semantic knowledge from 2D images into 3D LiDAR point clouds.
We propose a general multi-modal fusion framework Multi-Sem Fusion (MSF) to fuse the semantic information from both the 2D image and 3D points scene parsing results.
arXiv Detail & Related papers (2022-12-10T10:54:41Z) - Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data [80.14669385741202]
We propose a self-supervised pre-training method for 3D perception models tailored to autonomous driving data.
We leverage the availability of synchronized and calibrated image and Lidar sensors in autonomous driving setups.
Our method does not require any point cloud nor image annotations.
arXiv Detail & Related papers (2022-03-30T12:40:30Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z) - ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes [93.82668222075128]
We propose a 3D detection architecture called ImVoteNet for RGB-D scenes.
ImVoteNet is based on fusing 2D votes in images and 3D votes in point clouds.
We validate our model on the challenging SUN RGB-D dataset.
arXiv Detail & Related papers (2020-01-29T05:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.