VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
- URL: http://arxiv.org/abs/2303.11301v1
- Date: Mon, 20 Mar 2023 17:40:44 GMT
- Title: VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
- Authors: Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia
- Abstract summary: We propose VoxelNext for fully sparse 3D object detection.
Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.
Our strong sparse convolutional network VoxelNeXt detects and tracks 3D objects through voxel features entirely.
- Score: 78.25819070166351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detectors usually rely on hand-crafted proxies, e.g., anchors or
centers, and translate well-studied 2D frameworks to 3D. Thus, sparse voxel
features need to be densified and processed by dense prediction heads, which
inevitably costs extra computation. In this paper, we instead propose VoxelNext
for fully sparse 3D object detection. Our core insight is to predict objects
directly based on sparse voxel features, without relying on hand-crafted
proxies. Our strong sparse convolutional network VoxelNeXt detects and tracks
3D objects through voxel features entirely. It is an elegant and efficient
framework, with no need for sparse-to-dense conversion or NMS post-processing.
Our method achieves a better speed-accuracy trade-off than other mainframe
detectors on the nuScenes dataset. For the first time, we show that a fully
sparse voxel-based representation works decently for LIDAR 3D object detection
and tracking. Extensive experiments on nuScenes, Waymo, and Argoverse2
benchmarks validate the effectiveness of our approach. Without bells and
whistles, our model outperforms all existing LIDAR methods on the nuScenes
tracking test benchmark.
Related papers
- VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking [3.517993407670811]
Current LiDAR point cloud-based 3D single object tracking (SOT) methods typically rely on point-based representation network.
We introduce a novel tracking framework, termed VoxelTrack.
By voxelizing inherently disordered point clouds into 3D voxels, VoxelTrack effectively models precise and robust 3D spatial information.
arXiv Detail & Related papers (2024-08-05T06:38:43Z) - Rethinking Voxelization and Classification for 3D Object Detection [68.8204255655161]
The main challenge in 3D object detection from LiDAR point clouds is achieving real-time performance without affecting the reliability of the network.
We present a solution to improve network inference speed and precision at the same time by implementing a fast dynamic voxelizer.
In addition, we propose a lightweight detection sub-head model for classifying predicted objects and filter out false detected objects.
arXiv Detail & Related papers (2023-01-10T16:22:04Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds [50.54083964183614]
It is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete.
We propose DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors.
arXiv Detail & Related papers (2022-03-08T17:49:07Z) - FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection [3.330229314824913]
We present FCAF3D - a first-in-class fully convolutional anchor-free indoor 3D object detection method.
It is a simple yet effective method that uses a voxel representation of a point cloud and processes voxels with sparse convolutions.
It can handle large-scale scenes with minimal runtime through a single fully convolutional feed-forward pass.
arXiv Detail & Related papers (2021-12-01T07:28:52Z) - Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection [99.16162624992424]
We devise a simple but effective voxel-based framework, named Voxel R-CNN.
By taking full advantage of voxel features in a two stage approach, our method achieves comparable detection accuracy with state-of-the-art point-based models.
Our results show that Voxel R-CNN delivers a higher detection accuracy while maintaining a realtime frame processing rate, emphi.e, at a speed of 25 FPS on an NVIDIA 2080 Ti GPU.
arXiv Detail & Related papers (2020-12-31T17:02:46Z) - Generative Sparse Detection Networks for 3D Single-shot Object Detection [43.91336826079574]
3D object detection has been widely studied due to its potential applicability to many promising areas such as robotics and augmented reality.
Yet, the sparse nature of the 3D data poses unique challenges to this task.
We propose Generative Sparse Detection Network (GSDN), a fully-convolutional single-shot sparse detection network.
arXiv Detail & Related papers (2020-06-22T15:54:24Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.