Decomposing 3D Scenes into Objects via Unsupervised Volume Segmentation
- URL: http://arxiv.org/abs/2104.01148v1
- Date: Fri, 2 Apr 2021 16:59:29 GMT
- Title: Decomposing 3D Scenes into Objects via Unsupervised Volume Segmentation
- Authors: Karl Stelzner, Kristian Kersting, Adam R. Kosiorek
- Abstract summary: We present ObSuRF, a method which turns a single image of a scene into a 3D model represented as a set of Neural Radiance Fields (NeRFs)
We make learning more computationally efficient by deriving a novel loss, which allows training NeRFs on RGB-D inputs without explicit ray marching.
- Score: 26.868351498722884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present ObSuRF, a method which turns a single image of a scene into a 3D
model represented as a set of Neural Radiance Fields (NeRFs), with each NeRF
corresponding to a different object. A single forward pass of an encoder
network outputs a set of latent vectors describing the objects in the scene.
These vectors are used independently to condition a NeRF decoder, defining the
geometry and appearance of each object. We make learning more computationally
efficient by deriving a novel loss, which allows training NeRFs on RGB-D inputs
without explicit ray marching. After confirming that the model performs equal
or better than state of the art on three 2D image segmentation benchmarks, we
apply it to two multi-object 3D datasets: A multiview version of CLEVR, and a
novel dataset in which scenes are populated by ShapeNet models. We find that
after training ObSuRF on RGB-D views of training scenes, it is capable of not
only recovering the 3D geometry of a scene depicted in a single input image,
but also to segment it into objects, despite receiving no supervision in that
regard.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields [57.617972778377215]
We show how to generate effective 3D representations from posed RGB images.
We pretrain this representation at scale on our proposed curated posed-RGB data, totaling over 1.8 million images.
Our novel self-supervised pretraining for NeRFs, NeRF-MAE, scales remarkably well and improves performance on various challenging 3D tasks.
arXiv Detail & Related papers (2024-04-01T17:59:55Z) - Instance Neural Radiance Field [62.152611795824185]
This paper presents one of the first learning-based NeRF 3D instance segmentation pipelines, dubbed as Instance Neural Radiance Field.
We adopt a 3D proposal-based mask prediction network on the sampled volumetric features from NeRF.
Our method is also one of the first to achieve such results in pure inference.
arXiv Detail & Related papers (2023-04-10T05:49:24Z) - SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural
Radiance Fields [26.296017756560467]
In 3D, solutions must be consistent across multiple views and geometrically valid.
We propose a novel 3D inpainting method that addresses these challenges.
We first demonstrate the superiority of our approach on multiview segmentation, comparing to NeRFbased methods and 2D segmentation approaches.
arXiv Detail & Related papers (2022-11-22T13:14:50Z) - ONeRF: Unsupervised 3D Object Segmentation from Multiple Views [59.445957699136564]
ONeRF is a method that automatically segments and reconstructs object instances in 3D from multi-view RGB images without any additional manual annotations.
The segmented 3D objects are represented using separate Neural Radiance Fields (NeRFs) which allow for various 3D scene editing and novel view rendering.
arXiv Detail & Related papers (2022-11-22T06:19:37Z) - One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics.
Each object in the scene is depicted by a latent code capturing geometry and appearance.
This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.