ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes
- URL: http://arxiv.org/abs/2001.10692v1
- Date: Wed, 29 Jan 2020 05:09:28 GMT
- Title: ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes
- Authors: Charles R. Qi, Xinlei Chen, Or Litany, Leonidas J. Guibas
- Abstract summary: We propose a 3D detection architecture called ImVoteNet for RGB-D scenes.
ImVoteNet is based on fusing 2D votes in images and 3D votes in point clouds.
We validate our model on the challenging SUN RGB-D dataset.
- Score: 93.82668222075128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection has seen quick progress thanks to advances in deep
learning on point clouds. A few recent works have even shown state-of-the-art
performance with just point clouds input (e.g. VoteNet). However, point cloud
data have inherent limitations. They are sparse, lack color information and
often suffer from sensor noise. Images, on the other hand, have high resolution
and rich texture. Thus they can complement the 3D geometry provided by point
clouds. Yet how to effectively use image information to assist point cloud
based detection is still an open question. In this work, we build on top of
VoteNet and propose a 3D detection architecture called ImVoteNet specialized
for RGB-D scenes. ImVoteNet is based on fusing 2D votes in images and 3D votes
in point clouds. Compared to prior work on multi-modal detection, we explicitly
extract both geometric and semantic features from the 2D images. We leverage
camera parameters to lift these features to 3D. To improve the synergy of 2D-3D
feature fusion, we also propose a multi-tower training scheme. We validate our
model on the challenging SUN RGB-D dataset, advancing state-of-the-art results
by 5.7 mAP. We also provide rich ablation studies to analyze the contribution
of each design choice.
Related papers
- TriVol: Point Cloud Rendering via Triple Volumes [57.305748806545026]
We present a dense while lightweight 3D representation, named TriVol, that can be combined with NeRF to render photo-realistic images from point clouds.
Our framework has excellent generalization ability to render a category of scenes/objects without fine-tuning.
arXiv Detail & Related papers (2023-03-29T06:34:12Z) - Leveraging Single-View Images for Unsupervised 3D Point Cloud Completion [53.93172686610741]
Cross-PCC is an unsupervised point cloud completion method without requiring any 3D complete point clouds.
To take advantage of the complementary information from 2D images, we use a single-view RGB image to extract 2D features.
Our method even achieves comparable performance to some supervised methods.
arXiv Detail & Related papers (2022-12-01T15:11:21Z) - Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection [85.08249413137558]
LiDAR-produced point clouds are the major source for most state-of-the-art 3D object detectors.
Small, distant, and incomplete objects with sparse or few points are often hard to detect.
We present Sparse2Dense, a new framework to efficiently boost 3D detection performance by learning to densify point clouds in latent space.
arXiv Detail & Related papers (2022-11-23T16:01:06Z) - Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - An Overview Of 3D Object Detection [21.159668390764832]
We propose a framework that uses both RGB and point cloud data to perform multiclass object recognition.
We use the recently released nuScenes dataset---a large-scale dataset contains many data formats---to training and evaluate our proposed architecture.
arXiv Detail & Related papers (2020-10-29T14:04:50Z) - 3D Object Detection Method Based on YOLO and K-Means for Image and Point
Clouds [1.9458156037869139]
Lidar based 3D object detection and classification tasks are essential for autonomous driving.
This paper proposes a 3D object detection method based on point cloud and image.
arXiv Detail & Related papers (2020-04-21T04:32:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.