Unsupervised Discovery of Object-Centric Neural Fields
- URL: http://arxiv.org/abs/2402.07376v1
- Date: Mon, 12 Feb 2024 02:16:59 GMT
- Title: Unsupervised Discovery of Object-Centric Neural Fields
- Authors: Rundong Luo, Hong-Xing Yu, Jiajun Wu
- Abstract summary: We study inferring 3D object-centric scene representations from a single image.
We propose Unsupervised discovery of Object-Centric neural Fields (uOCF)
- Score: 21.223170092979498
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study inferring 3D object-centric scene representations from a single
image. While recent methods have shown potential in unsupervised 3D object
discovery from simple synthetic images, they fail to generalize to real-world
scenes with visually rich and diverse objects. This limitation stems from their
object representations, which entangle objects' intrinsic attributes like shape
and appearance with extrinsic, viewer-centric properties such as their 3D
location. To address this bottleneck, we propose Unsupervised discovery of
Object-Centric neural Fields (uOCF). uOCF focuses on learning the intrinsics of
objects and models the extrinsics separately. Our approach significantly
improves systematic generalization, thus enabling unsupervised learning of
high-fidelity object-centric scene representations from sparse real-world
images. To evaluate our approach, we collect three new datasets, including two
real kitchen environments. Extensive experiments show that uOCF enables
unsupervised discovery of visually rich objects from a single real image,
allowing applications such as 3D object segmentation and scene manipulation.
Notably, uOCF demonstrates zero-shot generalization to unseen objects from a
single real image. Project page: https://red-fairy.github.io/uOCF/
Related papers
- Variational Inference for Scalable 3D Object-centric Learning [19.445804699433353]
We tackle the task of scalable unsupervised object-centric representation learning on 3D scenes.
Existing approaches to object-centric representation learning show limitations in generalizing to larger scenes.
We propose to learn view-invariant 3D object representations in localized object coordinate systems.
arXiv Detail & Related papers (2023-09-25T10:23:40Z) - NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - LaTeRF: Label and Text Driven Object Radiance Fields [8.191404990730236]
We introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene and known camera poses.
To faithfully extract the object from the scene, LaTeRF extends the NeRF formulation with an additional objectness' probability at each 3D point.
We demonstrate high-fidelity object extraction on both synthetic and real datasets.
arXiv Detail & Related papers (2022-07-04T17:07:57Z) - Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis.
OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods.
It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - De-rendering 3D Objects in the Wild [21.16153549406485]
We present a weakly supervised method that is able to decompose a single image of an object into shape.
For training, the method only relies on a rough initial shape estimate of the training objects to bootstrap the learning process.
In our experiments, we show that the method can successfully de-render 2D images into a 3D representation and generalizes to unseen object categories.
arXiv Detail & Related papers (2022-01-06T23:50:09Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z) - ROOTS: Object-Centric Representation and Rendering of 3D Scenes [28.24758046060324]
A crucial ability of human intelligence is to build up models of individual 3D objects from partial scene observations.
Recent works achieve object-centric generation but without the ability to infer the representation.
We propose a probabilistic generative model for learning to build modular and compositional 3D object models.
arXiv Detail & Related papers (2020-06-11T00:42:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.