UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric
and Semantic Rendering
- URL: http://arxiv.org/abs/2306.09117v1
- Date: Thu, 15 Jun 2023 13:23:57 GMT
- Title: UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric
and Semantic Rendering
- Authors: Mingjie Pan, Li Liu, Jiaming Liu, Peixiang Huang, Longlong Wang,
Shanghang Zhang, Shaoqing Xu, Zhiyi Lai, Kuiyuan Yang
- Abstract summary: We present our solution, named UniOCC, for the Vision-Centric 3D occupancy prediction track.
Our solution achieves 51.27% mIoU on the official leaderboard with single model, placing 3rd in this challenge.
- Score: 27.712689811093362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this technical report, we present our solution, named UniOCC, for the
Vision-Centric 3D occupancy prediction track in the nuScenes Open Dataset
Challenge at CVPR 2023. Existing methods for occupancy prediction primarily
focus on optimizing projected features on 3D volume space using 3D occupancy
labels. However, the generation process of these labels is complex and
expensive (relying on 3D semantic annotations), and limited by voxel
resolution, they cannot provide fine-grained spatial semantics. To address this
limitation, we propose a novel Unifying Occupancy (UniOcc) prediction method,
explicitly imposing spatial geometry constraint and complementing fine-grained
semantic supervision through volume ray rendering. Our method significantly
enhances model performance and demonstrates promising potential in reducing
human annotation costs. Given the laborious nature of annotating 3D occupancy,
we further introduce a Depth-aware Teacher Student (DTS) framework to enhance
prediction accuracy using unlabeled data. Our solution achieves 51.27\% mIoU on
the official leaderboard with single model, placing 3rd in this challenge.
Related papers
- Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - Improving 3D Occupancy Prediction through Class-balancing Loss and Multi-scale Representation [7.651064601670273]
3D environment recognition is essential for autonomous driving systems.
Birds-Eye-View(BEV)-based perception has achieved the SOTA performance for this task.
We introduce a novel UNet-like Multi-scale Occupancy Head module to relieve this issue.
arXiv Detail & Related papers (2024-05-25T07:13:13Z) - Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction [10.698054425507475]
This letter presents a novel multi-modal, i.e., LiDAR-camera 3D semantic occupancy prediction framework, dubbed Co-Occ.
volume rendering in the feature space can proficiently bridge the gap between 3D LiDAR sweeps and 2D images.
arXiv Detail & Related papers (2024-04-06T09:01:19Z) - OccFlowNet: Towards Self-supervised Occupancy Estimation via
Differentiable Rendering and Occupancy Flow [0.6577148087211809]
We present a novel approach to occupancy estimation inspired by neural radiance field (NeRF) using only 2D labels.
We employ differentiable volumetric rendering to predict depth and semantic maps and train a 3D network based on 2D supervision only.
arXiv Detail & Related papers (2024-02-20T08:04:12Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Learning Occupancy for Monocular 3D Object Detection [25.56336546513198]
We propose textbfOccupancyM3D, a method of learning occupancy for monocular 3D detection.
It directly learns occupancy in frustum and 3D space, leading to more discriminative and informative 3D features and representations.
Experiments on KITTI and open datasets demonstrate that the proposed method achieves a new state of the art and surpasses other methods by a significant margin.
arXiv Detail & Related papers (2023-05-25T04:03:46Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.