LaTeRF: Label and Text Driven Object Radiance Fields
- URL: http://arxiv.org/abs/2207.01583v2
- Date: Tue, 5 Jul 2022 14:32:57 GMT
- Title: LaTeRF: Label and Text Driven Object Radiance Fields
- Authors: Ashkan Mirzaei, Yash Kant, Jonathan Kelly, and Igor Gilitschenski
- Abstract summary: We introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene and known camera poses.
To faithfully extract the object from the scene, LaTeRF extends the NeRF formulation with an additional objectness' probability at each 3D point.
We demonstrate high-fidelity object extraction on both synthetic and real datasets.
- Score: 8.191404990730236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Obtaining 3D object representations is important for creating photo-realistic
simulators and collecting assets for AR/VR applications. Neural fields have
shown their effectiveness in learning a continuous volumetric representation of
a scene from 2D images, but acquiring object representations from these models
with weak supervision remains an open challenge. In this paper we introduce
LaTeRF, a method for extracting an object of interest from a scene given 2D
images of the entire scene and known camera poses, a natural language
description of the object, and a small number of point-labels of object and
non-object points in the input images. To faithfully extract the object from
the scene, LaTeRF extends the NeRF formulation with an additional `objectness'
probability at each 3D point. Additionally, we leverage the rich latent space
of a pre-trained CLIP model combined with our differentiable object renderer,
to inpaint the occluded parts of the object. We demonstrate high-fidelity
object extraction on both synthetic and real datasets and justify our design
choices through an extensive ablation study.
Related papers
- SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Unsupervised Discovery of Object-Centric Neural Fields [21.223170092979498]
We study inferring 3D object-centric scene representations from a single image.
We propose Unsupervised discovery of Object-Centric neural Fields (uOCF)
arXiv Detail & Related papers (2024-02-12T02:16:59Z) - Slot-guided Volumetric Object Radiance Fields [13.996432950674045]
We present a novel framework for 3D object-centric representation learning.
Our approach effectively decomposes complex scenes into individual objects from a single image in an unsupervised fashion.
arXiv Detail & Related papers (2024-01-04T12:52:48Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - Object Wake-up: 3-D Object Reconstruction, Animation, and in-situ
Rendering from a Single Image [58.69732754597448]
Given a picture of a chair, could we extract the 3-D shape of the chair, animate its plausible articulations and motions, and render in-situ in its original image space?
We devise an automated approach to extract and manipulate articulated objects in single images.
arXiv Detail & Related papers (2021-08-05T16:20:12Z) - Sparse Pose Trajectory Completion [87.31270669154452]
We propose a method to learn, even using a dataset where objects appear only in sparsely sampled views.
This is achieved with a cross-modal pose trajectory transfer mechanism.
Our method is evaluated on the Pix3D and ShapeNet datasets.
arXiv Detail & Related papers (2021-05-01T00:07:21Z) - FiG-NeRF: Figure-Ground Neural Radiance Fields for 3D Object Category
Modelling [11.432178728985956]
We use Neural Radiance Fields (NeRF) to learn high quality 3D object category models from collections of input images.
We show that this method can learn accurate 3D object category models using only photometric supervision and casually captured images.
arXiv Detail & Related papers (2021-04-17T01:38:54Z) - Supervised Training of Dense Object Nets using Optimal Descriptors for
Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community.
In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs.
We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - ROOTS: Object-Centric Representation and Rendering of 3D Scenes [28.24758046060324]
A crucial ability of human intelligence is to build up models of individual 3D objects from partial scene observations.
Recent works achieve object-centric generation but without the ability to infer the representation.
We propose a probabilistic generative model for learning to build modular and compositional 3D object models.
arXiv Detail & Related papers (2020-06-11T00:42:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.