Sparse Fuse Dense: Towards High Quality 3D Detection with Depth
Completion
- URL: http://arxiv.org/abs/2203.09780v1
- Date: Fri, 18 Mar 2022 07:56:35 GMT
- Title: Sparse Fuse Dense: Towards High Quality 3D Detection with Depth
Completion
- Authors: Xiaopei Wu, Liang Peng, Honghui Yang, Liang Xie, Chenxi Huang, Chengqi
Deng, Haifeng Liu, Deng Cai
- Abstract summary: Current LiDAR-only 3D detection methods inevitably suffer from the sparsity of point clouds.
We present a novel multi-modal framework SFD (Sparse Fuse Dense), which utilizes pseudo point clouds generated from depth completion.
Our method holds the highest entry on the KITTI car 3D object detection leaderboard, demonstrating the effectiveness of our SFD.
- Score: 31.52721107477401
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current LiDAR-only 3D detection methods inevitably suffer from the sparsity
of point clouds. Many multi-modal methods are proposed to alleviate this issue,
while different representations of images and point clouds make it difficult to
fuse them, resulting in suboptimal performance. In this paper, we present a
novel multi-modal framework SFD (Sparse Fuse Dense), which utilizes pseudo
point clouds generated from depth completion to tackle the issues mentioned
above. Different from prior works, we propose a new RoI fusion strategy 3D-GAF
(3D Grid-wise Attentive Fusion) to make fuller use of information from
different types of point clouds. Specifically, 3D-GAF fuses 3D RoI features
from the couple of point clouds in a grid-wise attentive way, which is more
fine-grained and more precise. In addition, we propose a SynAugment
(Synchronized Augmentation) to enable our multi-modal framework to utilize all
data augmentation approaches tailored to LiDAR-only methods. Lastly, we
customize an effective and efficient feature extractor CPConv (Color Point
Convolution) for pseudo point clouds. It can explore 2D image features and 3D
geometric features of pseudo point clouds simultaneously. Our method holds the
highest entry on the KITTI car 3D object detection leaderboard, demonstrating
the effectiveness of our SFD. Code will be made publicly available.
Related papers
- Point Cloud Self-supervised Learning via 3D to Multi-view Masked
Autoencoder [21.73287941143304]
Multi-Modality Masked AutoEncoders (MAE) methods leverage both 2D images and 3D point clouds for pre-training.
We introduce a novel approach employing a 3D to multi-view masked autoencoder to fully harness the multi-modal attributes of 3D point clouds.
Our method outperforms state-of-the-art counterparts by a large margin in a variety of downstream tasks.
arXiv Detail & Related papers (2023-11-17T22:10:03Z) - TriVol: Point Cloud Rendering via Triple Volumes [57.305748806545026]
We present a dense while lightweight 3D representation, named TriVol, that can be combined with NeRF to render photo-realistic images from point clouds.
Our framework has excellent generalization ability to render a category of scenes/objects without fine-tuning.
arXiv Detail & Related papers (2023-03-29T06:34:12Z) - Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object
Detection [16.198358858773258]
Multi-modal 3D object detection has been an active research topic in autonomous driving.
It is non-trivial to explore the cross-modal feature fusion between sparse 3D points and dense 2D pixels.
Recent approaches either fuse the image features with the point cloud features that are projected onto the 2D image plane or combine the sparse point cloud with dense image pixels.
arXiv Detail & Related papers (2022-10-18T06:15:56Z) - Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D
Object Detection [19.419030878019974]
unstructured 3D point clouds are filled in the 2D plane and 3D point cloud features are extracted faster using projection-aware convolution layers.
The corresponding indexes between different sensor signals are established in advance in the data preprocessing.
Two new plug-and-play fusion modules, LiCamFuse and BiLiCamFuse, are proposed.
arXiv Detail & Related papers (2022-09-15T16:13:19Z) - From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object
Detection [101.20784125067559]
We propose a new architecture, namely Hallucinated Hollow-3D R-CNN, to address the problem of 3D object detection.
In our approach, we first extract the multi-view features by sequentially projecting the point clouds into the perspective view and the bird-eye view.
The 3D objects are detected via a box refinement module with a novel Hierarchical Voxel RoI Pooling operation.
arXiv Detail & Related papers (2021-07-30T02:00:06Z) - ParaNet: Deep Regular Representation for 3D Point Clouds [62.81379889095186]
ParaNet is a novel end-to-end deep learning framework for representing 3D point clouds.
It converts an irregular 3D point cloud into a regular 2D color image, named point geometry image (PGI)
In contrast to conventional regular representation modalities based on multi-view projection and voxelization, the proposed representation is differentiable and reversible.
arXiv Detail & Related papers (2020-12-05T13:19:55Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes [93.82668222075128]
We propose a 3D detection architecture called ImVoteNet for RGB-D scenes.
ImVoteNet is based on fusing 2D votes in images and 3D votes in point clouds.
We validate our model on the challenging SUN RGB-D dataset.
arXiv Detail & Related papers (2020-01-29T05:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.