RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering
Supervision
- URL: http://arxiv.org/abs/2309.09502v2
- Date: Mon, 4 Mar 2024 15:16:27 GMT
- Title: RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering
Supervision
- Authors: Mingjie Pan, Jiaming Liu, Renrui Zhang, Peixiang Huang, Xiaoqi Li,
Bing Wang, Hongwei Xie, Li Liu, Shanghang Zhang
- Abstract summary: We present RenderOcc, a novel paradigm for training 3D occupancy models only using 2D labels.
Specifically, we extract a NeRF-style 3D volume representation from multi-view images.
We employ volume rendering techniques to establish 2D renderings, thus enabling direct 3D supervision from 2D semantics and depth labels.
- Score: 36.15913507034939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D occupancy prediction holds significant promise in the fields of robot
perception and autonomous driving, which quantifies 3D scenes into grid cells
with semantic labels. Recent works mainly utilize complete occupancy labels in
3D voxel space for supervision. However, the expensive annotation process and
sometimes ambiguous labels have severely constrained the usability and
scalability of 3D occupancy models. To address this, we present RenderOcc, a
novel paradigm for training 3D occupancy models only using 2D labels.
Specifically, we extract a NeRF-style 3D volume representation from multi-view
images, and employ volume rendering techniques to establish 2D renderings, thus
enabling direct 3D supervision from 2D semantics and depth labels.
Additionally, we introduce an Auxiliary Ray method to tackle the issue of
sparse viewpoints in autonomous driving scenarios, which leverages sequential
frames to construct comprehensive 2D rendering for each object. To our best
knowledge, RenderOcc is the first attempt to train multi-view 3D occupancy
models only using 2D labels, reducing the dependence on costly 3D occupancy
annotations. Extensive experiments demonstrate that RenderOcc achieves
comparable performance to models fully supervised with 3D labels, underscoring
the significance of this approach in real-world applications.
Related papers
- Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D [95.14469865815768]
2D vision models can be used for semantic segmentation, style transfer or scene editing, enabled by large-scale 2D image datasets.
However, extending a single 2D vision operator like scene editing to 3D typically requires a highly creative method specialized to that task.
In this paper, we propose Lift3D, which trains to predict unseen views on feature spaces generated by a few visual models.
We even outperform state-of-the-art methods specialized for the task in question.
arXiv Detail & Related papers (2024-03-27T18:13:16Z) - OccFlowNet: Towards Self-supervised Occupancy Estimation via
Differentiable Rendering and Occupancy Flow [0.6577148087211809]
We present a novel approach to occupancy estimation inspired by neural radiance field (NeRF) using only 2D labels.
We employ differentiable volumetric rendering to predict depth and semantic maps and train a 3D network based on 2D supervision only.
arXiv Detail & Related papers (2024-02-20T08:04:12Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Uni3D: Exploring Unified 3D Representation at Scale [66.26710717073372]
We present Uni3D, a 3D foundation model to explore the unified 3D representation at scale.
Uni3D uses a 2D ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features.
We show that the strong Uni3D representation also enables applications such as 3D painting and retrieval in the wild.
arXiv Detail & Related papers (2023-10-10T16:49:21Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene
Segmentation [48.677336052620895]
We present a novel 3D-to-2D label transfer method, Panoptic NeRF, which aims for obtaining per-pixel 2D semantic and instance labels.
By inferring in 3D space and rendering to 2D labels, our 2D semantic and instance labels are multi-view consistent by design.
arXiv Detail & Related papers (2022-03-29T04:16:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.