ROOTS: Object-Centric Representation and Rendering of 3D Scenes
- URL: http://arxiv.org/abs/2006.06130v3
- Date: Thu, 1 Jul 2021 21:24:43 GMT
- Title: ROOTS: Object-Centric Representation and Rendering of 3D Scenes
- Authors: Chang Chen, Fei Deng, Sungjin Ahn
- Abstract summary: A crucial ability of human intelligence is to build up models of individual 3D objects from partial scene observations.
Recent works achieve object-centric generation but without the ability to infer the representation.
We propose a probabilistic generative model for learning to build modular and compositional 3D object models.
- Score: 28.24758046060324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A crucial ability of human intelligence is to build up models of individual
3D objects from partial scene observations. Recent works achieve object-centric
generation but without the ability to infer the representation, or achieve 3D
scene representation learning but without object-centric compositionality.
Therefore, learning to represent and render 3D scenes with object-centric
compositionality remains elusive. In this paper, we propose a probabilistic
generative model for learning to build modular and compositional 3D object
models from partial observations of a multi-object scene. The proposed model
can (i) infer the 3D object representations by learning to search and group
object areas and also (ii) render from an arbitrary viewpoint not only
individual objects but also the full scene by compositing the objects. The
entire learning process is unsupervised and end-to-end. In experiments, in
addition to generation quality, we also demonstrate that the learned
representation permits object-wise manipulation and novel scene generation, and
generalizes to various settings. Results can be found on our project website:
https://sites.google.com/view/roots3d
Related papers
- SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Disentangled 3D Scene Generation with Layout Learning [109.03233745767062]
We introduce a method to generate 3D scenes that are disentangled into their component objects.
Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene.
We show that despite its simplicity, our approach successfully generates 3D scenes into individual objects.
arXiv Detail & Related papers (2024-02-26T18:54:15Z) - Unsupervised Discovery of Object-Centric Neural Fields [21.223170092979498]
We study inferring 3D object-centric scene representations from a single image.
We propose Unsupervised discovery of Object-Centric neural Fields (uOCF)
arXiv Detail & Related papers (2024-02-12T02:16:59Z) - Variational Inference for Scalable 3D Object-centric Learning [19.445804699433353]
We tackle the task of scalable unsupervised object-centric representation learning on 3D scenes.
Existing approaches to object-centric representation learning show limitations in generalizing to larger scenes.
We propose to learn view-invariant 3D object representations in localized object coordinate systems.
arXiv Detail & Related papers (2023-09-25T10:23:40Z) - gCoRF: Generative Compositional Radiance Fields [80.45269080324677]
3D generative models of objects enable photorealistic image synthesis with 3D control.
Existing methods model the scene as a global scene representation, ignoring the compositional aspect of the scene.
We present a compositional generative model, where each semantic part of the object is represented as an independent 3D representation.
arXiv Detail & Related papers (2022-10-31T14:10:44Z) - Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis.
OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods.
It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z) - LanguageRefer: Spatial-Language Model for 3D Visual Grounding [72.7618059299306]
We develop a spatial-language model for a 3D visual grounding problem.
We show that our model performs competitively on visio-linguistic datasets proposed by ReferIt3D.
arXiv Detail & Related papers (2021-07-07T18:55:03Z) - Disentangling 3D Prototypical Networks For Few-Shot Concept Learning [29.02523358573336]
We present neural architectures that disentangle RGB-D images into objects' shapes and styles and a map of the background scene.
Our networks incorporate architectural biases that reflect the image formation process, 3D geometry of the world scene, and shape-style interplay.
arXiv Detail & Related papers (2020-11-06T14:08:27Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.