Regulating Intermediate 3D Features for Vision-Centric Autonomous
Driving
- URL: http://arxiv.org/abs/2312.11837v1
- Date: Tue, 19 Dec 2023 04:09:05 GMT
- Title: Regulating Intermediate 3D Features for Vision-Centric Autonomous
Driving
- Authors: Junkai Xu, Liang Peng, Haoran Cheng, Linxuan Xia, Qi Zhou, Dan Deng,
Wei Qian, Wenxiao Wang, Deng Cai
- Abstract summary: We propose to regulate intermediate dense 3D features with the help of volume rendering.
Experimental results on the Occ3D and nuScenes datasets demonstrate that Vampire facilitates fine-grained and appropriate extraction of dense 3D features.
- Score: 26.03800936700545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-camera perception tasks have gained significant attention in the field
of autonomous driving. However, existing frameworks based on Lift-Splat-Shoot
(LSS) in the multi-camera setting cannot produce suitable dense 3D features due
to the projection nature and uncontrollable densification process. To resolve
this problem, we propose to regulate intermediate dense 3D features with the
help of volume rendering. Specifically, we employ volume rendering to process
the dense 3D features to obtain corresponding 2D features (e.g., depth maps,
semantic maps), which are supervised by associated labels in the training. This
manner regulates the generation of dense 3D features on the feature level,
providing appropriate dense and unified features for multiple perception tasks.
Therefore, our approach is termed Vampire, stands for "Volume rendering As
Multi-camera Perception Intermediate feature REgulator". Experimental results
on the Occ3D and nuScenes datasets demonstrate that Vampire facilitates
fine-grained and appropriate extraction of dense 3D features, and is
competitive with existing SOTA methods across diverse downstream perception
tasks like 3D occupancy prediction, LiDAR segmentation and 3D objection
detection, while utilizing moderate GPU resources. We provide a video
demonstration in the supplementary materials and Codes are available at
github.com/cskkxjk/Vampire.
Related papers
- Volumetric Environment Representation for Vision-Language Navigation [66.04379819772764]
Vision-language navigation (VLN) requires an agent to navigate through a 3D environment based on visual observations and natural language instructions.
We introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.
VER predicts 3D occupancy, 3D room layout, and 3D bounding boxes jointly.
arXiv Detail & Related papers (2024-03-21T06:14:46Z) - SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection [19.75965521357068]
We propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection) to improve the accuracy of 3D object detection.
Our results show that SOGDet consistently enhance the performance of three baseline methods in terms of nuScenes Detection Score (NDS) and mean Average Precision (mAP)
This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems.
arXiv Detail & Related papers (2023-08-26T07:38:21Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - SRCN3D: Sparse R-CNN 3D for Compact Convolutional Multi-View 3D Object
Detection and Tracking [12.285423418301683]
This paper proposes Sparse R-CNN 3D (SRCN3D), a novel two-stage fully-sparse detector that incorporates sparse queries, sparse attention with box-wise sampling, and sparse prediction.
Experiments on nuScenes dataset demonstrate that SRCN3D achieves competitive performance in both 3D object detection and multi-object tracking tasks.
arXiv Detail & Related papers (2022-06-29T07:58:39Z) - Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data [80.14669385741202]
We propose a self-supervised pre-training method for 3D perception models tailored to autonomous driving data.
We leverage the availability of synchronized and calibrated image and Lidar sensors in autonomous driving setups.
Our method does not require any point cloud nor image annotations.
arXiv Detail & Related papers (2022-03-30T12:40:30Z) - 3D Crowd Counting via Geometric Attention-guided Multi-View Fusion [50.520192402702015]
We propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps.
Compared to 2D fusion, the 3D fusion extracts more information of the people along the z-dimension (height), which helps to address the scale variations across multiple views.
The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density.
arXiv Detail & Related papers (2020-03-18T11:35:11Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.