Panoptic Lifting for 3D Scene Understanding with Neural Fields
- URL: http://arxiv.org/abs/2212.09802v1
- Date: Mon, 19 Dec 2022 19:15:36 GMT
- Title: Panoptic Lifting for 3D Scene Understanding with Neural Fields
- Authors: Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bul\'o, Norman M\"uller,
Matthias Nie{\ss}ner, Angela Dai, Peter Kontschieder
- Abstract summary: We propose a novel approach for learning panoptic 3D representations from images of in-the-wild scenes.
Our method requires only machine-generated 2D panoptic segmentation masks inferred from a pre-trained network.
Experimental results validate our approach on the challenging Hypersim, Replica, and ScanNet datasets.
- Score: 32.59498558663363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Panoptic Lifting, a novel approach for learning panoptic 3D
volumetric representations from images of in-the-wild scenes. Once trained, our
model can render color images together with 3D-consistent panoptic segmentation
from novel viewpoints.
Unlike existing approaches which use 3D input directly or indirectly, our
method requires only machine-generated 2D panoptic segmentation masks inferred
from a pre-trained network. Our core contribution is a panoptic lifting scheme
based on a neural field representation that generates a unified and multi-view
consistent, 3D panoptic representation of the scene. To account for
inconsistencies of 2D instance identifiers across views, we solve a linear
assignment with a cost based on the model's current predictions and the
machine-generated segmentation masks, thus enabling us to lift 2D instances to
3D in a consistent way. We further propose and ablate contributions that make
our method more robust to noisy, machine-generated labels, including test-time
augmentations for confidence estimates, segment consistency loss, bounded
segmentation fields, and gradient stopping.
Experimental results validate our approach on the challenging Hypersim,
Replica, and ScanNet datasets, improving by 8.4, 13.8, and 10.6% in scene-level
PQ over state of the art.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - Efficient 3D Instance Mapping and Localization with Neural Fields [39.73128916618561]
We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images.
We introduce 3DIML, a novel framework that efficiently learns a neural label field which can render 3D instance segmentation masks from novel viewpoints.
arXiv Detail & Related papers (2024-03-28T19:25:25Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Next3D: Generative Neural Texture Rasterization for 3D-Aware Head
Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery.
Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly.
We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Neural Volumetric Object Selection [126.04480613166194]
We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF)
Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views.
arXiv Detail & Related papers (2022-05-30T08:55:20Z) - Weakly Supervised Volumetric Image Segmentation with Deformed Templates [80.04326168716493]
We propose an approach that is truly weakly-supervised in the sense that we only need to provide a sparse set of 3D point on the surface of target objects.
We will show that it outperforms a more traditional approach to weak-supervision in 3D at a reduced supervision cost.
arXiv Detail & Related papers (2021-06-07T22:09:34Z) - Semantic Implicit Neural Scene Representations With Semi-Supervised
Training [47.61092265963234]
We show that implicit neural scene representations can be leveraged to perform per-point semantic segmentation.
Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks.
We explore two novel applications for this semantically aware implicit neural scene representation.
arXiv Detail & Related papers (2020-03-28T00:43:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.