Fully Sparse 3D Occupancy Prediction
- URL: http://arxiv.org/abs/2312.17118v5
- Date: Fri, 19 Jul 2024 03:01:09 GMT
- Title: Fully Sparse 3D Occupancy Prediction
- Authors: Haisong Liu, Yang Chen, Haiguang Wang, Zetong Yang, Tianyu Li, Jia Zeng, Li Chen, Hongyang Li, Limin Wang,
- Abstract summary: Occupancy prediction plays a pivotal role in autonomous driving.
Previous methods typically construct dense 3D volumes, neglecting the inherent sparsity of the scene and suffering from high computational costs.
We introduce a novel fully sparse occupancy network, termed SparseOcc.
SparseOcc initially reconstructs a sparse 3D representation from camera-only inputs and subsequently predicts semantic/instance occupancy from the 3D sparse representation by sparse queries.
- Score: 37.265473869812816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Occupancy prediction plays a pivotal role in autonomous driving. Previous methods typically construct dense 3D volumes, neglecting the inherent sparsity of the scene and suffering from high computational costs. To bridge the gap, we introduce a novel fully sparse occupancy network, termed SparseOcc. SparseOcc initially reconstructs a sparse 3D representation from camera-only inputs and subsequently predicts semantic/instance occupancy from the 3D sparse representation by sparse queries. A mask-guided sparse sampling is designed to enable sparse queries to interact with 2D features in a fully sparse manner, thereby circumventing costly dense features or global attention. Additionally, we design a thoughtful ray-based evaluation metric, namely RayIoU, to solve the inconsistency penalty along the depth axis raised in traditional voxel-level mIoU criteria. SparseOcc demonstrates its effectiveness by achieving a RayIoU of 34.0, while maintaining a real-time inference speed of 17.3 FPS, with 7 history frames inputs. By incorporating more preceding frames to 15, SparseOcc continuously improves its performance to 35.1 RayIoU without bells and whistles.
Related papers
- SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction [15.331332063879342]
We propose SparseOcc, an efficient occupancy network inspired by sparse point cloud processing.
SparseOcc achieves a remarkable 74.9% reduction on FLOPs over the dense baseline.
It also improves accuracy, from 12.8% to 14.1% mIOU, which in part can be attributed to the sparse representation's ability to avoid hallucinations on empty voxels.
arXiv Detail & Related papers (2024-04-15T06:45:06Z) - OccupancyDETR: Using DETR for Mixed Dense-sparse 3D Occupancy Prediction [10.87136340580404]
Visual-based 3D semantic occupancy perception is a key technology for robotics, including autonomous vehicles.
We propose a novel 3D semantic occupancy perception method, OccupancyDETR, which utilizes a DETR-like object detection, a mixed dense-sparse 3D occupancy decoder.
Our approach strikes a balance between efficiency and accuracy, achieving faster inference times, lower resource consumption, and improved performance for small object detection.
arXiv Detail & Related papers (2023-09-15T16:06:23Z) - Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for
Efficient 3D Object Detection [19.321076175294902]
Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving.
Their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles.
We propose an adaptive inference framework called Ada3D, which focuses on exploiting the input-level spatial redundancy.
arXiv Detail & Related papers (2023-07-17T02:58:51Z) - Fully Sparse Fusion for 3D Object Detection [69.32694845027927]
Currently prevalent multimodal 3D detection methods are built upon LiDAR-based detectors that usually use dense Bird's-Eye-View feature maps.
Fully sparse architecture is gaining attention as they are highly efficient in long-range perception.
In this paper, we study how to effectively leverage image modality in the emerging fully sparse architecture.
arXiv Detail & Related papers (2023-04-24T17:57:43Z) - VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking [78.25819070166351]
We propose VoxelNext for fully sparse 3D object detection.
Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.
Our strong sparse convolutional network VoxelNeXt detects and tracks 3D objects through voxel features entirely.
arXiv Detail & Related papers (2023-03-20T17:40:44Z) - OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic
Occupancy Perception [73.05425657479704]
We propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.
We extend the large-scale nuScenes dataset with dense semantic occupancy annotations.
Considering the complexity of surrounding occupancy perception, we propose the Cascade Occupancy Network (CONet) to refine the coarse prediction.
arXiv Detail & Related papers (2023-03-07T15:43:39Z) - OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object
Detection [51.153003057515754]
OPA-3D is a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network.
It jointly estimates dense scene depth with depth-bounding box residuals and object bounding boxes.
It outperforms state-of-the-art methods on the main Car category.
arXiv Detail & Related papers (2022-11-02T14:19:13Z) - Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion
Forecasting with a Single Convolutional Net [93.51773847125014]
We propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor.
Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world.
arXiv Detail & Related papers (2020-12-22T22:43:35Z) - Generative Sparse Detection Networks for 3D Single-shot Object Detection [43.91336826079574]
3D object detection has been widely studied due to its potential applicability to many promising areas such as robotics and augmented reality.
Yet, the sparse nature of the 3D data poses unique challenges to this task.
We propose Generative Sparse Detection Network (GSDN), a fully-convolutional single-shot sparse detection network.
arXiv Detail & Related papers (2020-06-22T15:54:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.