Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection
- URL: http://arxiv.org/abs/2012.15712v2
- Date: Fri, 5 Feb 2021 16:25:48 GMT
- Title: Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection
- Authors: Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang,
Houqiang Li
- Abstract summary: We devise a simple but effective voxel-based framework, named Voxel R-CNN.
By taking full advantage of voxel features in a two stage approach, our method achieves comparable detection accuracy with state-of-the-art point-based models.
Our results show that Voxel R-CNN delivers a higher detection accuracy while maintaining a realtime frame processing rate, emphi.e, at a speed of 25 FPS on an NVIDIA 2080 Ti GPU.
- Score: 99.16162624992424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances on 3D object detection heavily rely on how the 3D data are
represented, \emph{i.e.}, voxel-based or point-based representation. Many
existing high performance 3D detectors are point-based because this structure
can better retain precise point positions. Nevertheless, point-level features
lead to high computation overheads due to unordered storage. In contrast, the
voxel-based structure is better suited for feature extraction but often yields
lower accuracy because the input data are divided into grids. In this paper, we
take a slightly different viewpoint -- we find that precise positioning of raw
points is not essential for high performance 3D object detection and that the
coarse voxel granularity can also offer sufficient detection accuracy. Bearing
this view in mind, we devise a simple but effective voxel-based framework,
named Voxel R-CNN. By taking full advantage of voxel features in a two stage
approach, our method achieves comparable detection accuracy with
state-of-the-art point-based models, but at a fraction of the computation cost.
Voxel R-CNN consists of a 3D backbone network, a 2D bird-eye-view (BEV) Region
Proposal Network and a detect head. A voxel RoI pooling is devised to extract
RoI features directly from voxel features for further refinement. Extensive
experiments are conducted on the widely used KITTI Dataset and the more recent
Waymo Open Dataset. Our results show that compared to existing voxel-based
methods, Voxel R-CNN delivers a higher detection accuracy while maintaining a
real-time frame processing rate, \emph{i.e}., at a speed of 25 FPS on an NVIDIA
RTX 2080 Ti GPU. The code is available at
\url{https://github.com/djiajunustc/Voxel-R-CNN}.
Related papers
- VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking [78.25819070166351]
We propose VoxelNext for fully sparse 3D object detection.
Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.
Our strong sparse convolutional network VoxelNeXt detects and tracks 3D objects through voxel features entirely.
arXiv Detail & Related papers (2023-03-20T17:40:44Z) - Graph R-CNN: Towards Accurate 3D Object Detection with
Semantic-Decorated Local Graph [26.226885108862735]
Two-stage detectors have gained much popularity in 3D object detection.
Most two-stage 3D detectors utilize grid points, voxel grids, or sampled keypoints for RoI feature extraction in the second stage.
This paper solves this problem in three aspects.
arXiv Detail & Related papers (2022-08-07T02:56:56Z) - From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with
Voxel-to-Point Decoder [79.39041453836793]
We present an Intersection-over-Union (IoU) guided two-stage 3D object detector with a voxel-to-point decoder.
We propose a residual voxel-to-point decoder to extract the point features in addition to the map-view features from the voxel based Region Proposal Network (RPN)
We propose a simple and efficient method to align the estimated IoUs to the refined proposal boxes as a more relevant localization confidence.
arXiv Detail & Related papers (2021-08-08T14:30:13Z) - PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector
Representation for 3D Object Detection [100.60209139039472]
We propose the PointVoxel Region based Convolution Neural Networks (PVRCNNs) for accurate 3D detection from point clouds.
Our proposed PV-RCNNs significantly outperform previous state-of-the-art 3D detection methods on both the Open dataset and the highly-competitive KITTI benchmark.
arXiv Detail & Related papers (2021-01-31T14:51:49Z) - InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic
Information Modeling [65.47126868838836]
We propose a novel 3D object detection framework with dynamic information modeling.
Coarse predictions are generated in the first stage via a voxel-based region proposal network.
Experiments are conducted on the large-scale nuScenes 3D detection benchmark.
arXiv Detail & Related papers (2020-07-16T18:27:08Z) - SVGA-Net: Sparse Voxel-Graph Attention Network for 3D Object Detection
from Point Clouds [8.906003527848636]
We propose Sparse Voxel-Graph Attention Network (SVGA-Net) to achieve comparable 3D detection tasks from raw LIDAR data.
SVGA-Net constructs the local complete graph within each divided 3D spherical voxel and global KNN graph through all voxels.
Experiments on KITTI detection benchmark demonstrate the efficiency of extending the graph representation to 3D object detection.
arXiv Detail & Related papers (2020-06-07T05:01:06Z) - PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection [76.30585706811993]
We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN)
Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction.
It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks.
arXiv Detail & Related papers (2019-12-31T06:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.