Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene
Segmentation
- URL: http://arxiv.org/abs/2203.15224v1
- Date: Tue, 29 Mar 2022 04:16:40 GMT
- Title: Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene
Segmentation
- Authors: Xiao Fu, Shangzhan Zhang, Tianrun Chen, Yichong Lu, Lanyun Zhu,
Xiaowei Zhou, Andreas Geiger, Yiyi Liao
- Abstract summary: We present a novel 3D-to-2D label transfer method, Panoptic NeRF, which aims for obtaining per-pixel 2D semantic and instance labels.
By inferring in 3D space and rendering to 2D labels, our 2D semantic and instance labels are multi-view consistent by design.
- Score: 48.677336052620895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale training data with high-quality annotations is critical for
training semantic and instance segmentation models. Unfortunately, pixel-wise
annotation is labor-intensive and costly, raising the demand for more efficient
labeling strategies. In this work, we present a novel 3D-to-2D label transfer
method, Panoptic NeRF, which aims for obtaining per-pixel 2D semantic and
instance labels from easy-to-obtain coarse 3D bounding primitives. Our method
utilizes NeRF as a differentiable tool to unify coarse 3D annotations and 2D
semantic cues transferred from existing datasets. We demonstrate that this
combination allows for improved geometry guided by semantic information,
enabling rendering of accurate semantic maps across multiple views.
Furthermore, this fusion process resolves label ambiguity of the coarse 3D
annotations and filters noise in the 2D predictions. By inferring in 3D space
and rendering to 2D labels, our 2D semantic and instance labels are multi-view
consistent by design. Experimental results show that Panoptic NeRF outperforms
existing semantic and instance label transfer methods in terms of accuracy and
multi-view consistency on challenging urban scenes of the KITTI-360 dataset.
Related papers
- Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes [56.297018535422524]
Training perception systems for self-driving cars requires substantial annotations.
Existing datasets provide rich annotations for pre-recorded sequences, but they fall short in labeling rarely encountered viewpoints.
We present PanopticNeRF-360, a novel approach that combines coarse 3D annotations with noisy 2D semantic cues to generate consistent panoptic labels.
arXiv Detail & Related papers (2023-09-19T17:54:22Z) - RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering
Supervision [36.15913507034939]
We present RenderOcc, a novel paradigm for training 3D occupancy models only using 2D labels.
Specifically, we extract a NeRF-style 3D volume representation from multi-view images.
We employ volume rendering techniques to establish 2D renderings, thus enabling direct 3D supervision from 2D semantics and depth labels.
arXiv Detail & Related papers (2023-09-18T06:08:15Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - Image Understands Point Cloud: Weakly Supervised 3D Semantic
Segmentation via Association Learning [59.64695628433855]
We propose a novel cross-modality weakly supervised method for 3D segmentation, incorporating complementary information from unlabeled images.
Basically, we design a dual-branch network equipped with an active labeling strategy, to maximize the power of tiny parts of labels.
Our method even outperforms the state-of-the-art fully supervised competitors with less than 1% actively selected annotations.
arXiv Detail & Related papers (2022-09-16T07:59:04Z) - Learning 3D Semantic Segmentation with only 2D Image Supervision [18.785840615548473]
We train a 3D model from pseudo-labels derived from 2D semantic image segmentations using multiview fusion.
The proposed network architecture, 2D3DNet, achieves significantly better performance than baselines during experiments on a new urban dataset with lidar and images captured in 20 cities across 5 continents.
arXiv Detail & Related papers (2021-10-21T17:56:28Z) - 3D Guided Weakly Supervised Semantic Segmentation [27.269847900950943]
We propose a weakly supervised 2D semantic segmentation model by incorporating sparse bounding box labels with available 3D information.
We manually labeled a subset of the 2D-3D Semantics(2D-3D-S) dataset with bounding boxes, and introduce our 2D-3D inference module to generate accurate pixel-wise segment proposal masks.
arXiv Detail & Related papers (2020-12-01T03:34:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.