Slot-guided Volumetric Object Radiance Fields
- URL: http://arxiv.org/abs/2401.02241v1
- Date: Thu, 4 Jan 2024 12:52:48 GMT
- Title: Slot-guided Volumetric Object Radiance Fields
- Authors: Di Qi, Tong Yang, Xiangyu Zhang
- Abstract summary: We present a novel framework for 3D object-centric representation learning.
Our approach effectively decomposes complex scenes into individual objects from a single image in an unsupervised fashion.
- Score: 13.996432950674045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel framework for 3D object-centric representation learning.
Our approach effectively decomposes complex scenes into individual objects from
a single image in an unsupervised fashion. This method, called slot-guided
Volumetric Object Radiance Fields (sVORF), composes volumetric object radiance
fields with object slots as a guidance to implement unsupervised 3D scene
decomposition. Specifically, sVORF obtains object slots from a single image via
a transformer module, maps these slots to volumetric object radiance fields
with a hypernetwork and composes object radiance fields with the guidance of
object slots at a 3D location. Moreover, sVORF significantly reduces memory
requirement due to small-sized pixel rendering during training. We demonstrate
the effectiveness of our approach by showing top results in scene decomposition
and generation tasks of complex synthetic datasets (e.g., Room-Diverse).
Furthermore, we also confirm the potential of sVORF to segment objects in
real-world scenes (e.g., the LLFF dataset). We hope our approach can provide
preliminary understanding of the physical world and help ease future research
in 3D object-centric representation learning.
Related papers
- Variational Inference for Scalable 3D Object-centric Learning [19.445804699433353]
We tackle the task of scalable unsupervised object-centric representation learning on 3D scenes.
Existing approaches to object-centric representation learning show limitations in generalizing to larger scenes.
We propose to learn view-invariant 3D object representations in localized object coordinate systems.
arXiv Detail & Related papers (2023-09-25T10:23:40Z) - Unsupervised Multi-View Object Segmentation Using Radiance Field
Propagation [55.9577535403381]
We present a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene.
The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss.
To the best of our knowledge, RFP is the first unsupervised approach for tackling 3D scene object segmentation for neural radiance field (NeRF)
arXiv Detail & Related papers (2022-10-02T11:14:23Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - LaTeRF: Label and Text Driven Object Radiance Fields [8.191404990730236]
We introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene and known camera poses.
To faithfully extract the object from the scene, LaTeRF extends the NeRF formulation with an additional objectness' probability at each 3D point.
We demonstrate high-fidelity object extraction on both synthetic and real datasets.
arXiv Detail & Related papers (2022-07-04T17:07:57Z) - Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis.
OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods.
It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z) - Unsupervised Discovery and Composition of Object Light Fields [57.198174741004095]
We propose to represent objects in an object-centric, compositional scene representation as light fields.
We propose a novel light field compositor module that enables reconstructing the global light field from a set of object-centric light fields.
arXiv Detail & Related papers (2022-05-08T17:50:35Z) - Unsupervised Discovery of Object Radiance Fields [86.20162437780671]
Object Radiance Fields (uORF) learns to decompose complex scenes with diverse, textured background from a single image.
We show uORF performs well on unsupervised 3D scene segmentation, novel view synthesis, and scene editing on three datasets.
arXiv Detail & Related papers (2021-07-16T13:53:36Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.