Factored Neural Representation for Scene Understanding
- URL: http://arxiv.org/abs/2304.10950v3
- Date: Wed, 21 Jun 2023 03:37:26 GMT
- Title: Factored Neural Representation for Scene Understanding
- Authors: Yu-Shiang Wong, Niloy J. Mitra
- Abstract summary: We introduce a factored neural scene representation that can directly be learned from a monocular RGB-D video to produce object-level neural presentations.
We evaluate ours against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable.
- Score: 39.66967677639173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A long-standing goal in scene understanding is to obtain interpretable and
editable representations that can be directly constructed from a raw monocular
RGB-D video, without requiring specialized hardware setup or priors. The
problem is significantly more challenging in the presence of multiple moving
and/or deforming objects. Traditional methods have approached the setup with a
mix of simplifications, scene priors, pretrained templates, or known
deformation models. The advent of neural representations, especially neural
implicit representations and radiance fields, opens the possibility of
end-to-end optimization to collectively capture geometry, appearance, and
object motion. However, current approaches produce global scene encoding,
assume multiview capture with limited or no motion in the scenes, and do not
facilitate easy manipulation beyond novel view synthesis. In this work, we
introduce a factored neural scene representation that can directly be learned
from a monocular RGB-D video to produce object-level neural presentations with
an explicit encoding of object movement (e.g., rigid trajectory) and/or
deformations (e.g., nonrigid movement). We evaluate ours against a set of
neural approaches on both synthetic and real data to demonstrate that the
representation is efficient, interpretable, and editable (e.g., change object
trajectory). Code and data are available at
http://geometry.cs.ucl.ac.uk/projects/2023/factorednerf .
Related papers
- MoST: Multi-modality Scene Tokenization for Motion Prediction [39.97334929667033]
We propose tokenizing the visual world into a compact set of scene elements.
We then leverage pre-trained image foundation models and LiDAR neural networks to encode all the scene elements in an open-vocabulary manner.
Our proposed representation can efficiently encode the multi-frame multi-modality observations with a few hundred tokens.
arXiv Detail & Related papers (2024-04-30T13:09:41Z) - RUST: Latent Neural Scene Representations from Unposed Imagery [21.433079925439234]
Inferring structure of 3D scenes from 2D observations is a fundamental challenge in computer vision.
Recent popularized approaches based on neural scene representations have achieved tremendous impact.
RUST (Really Unposed Scene representation Transformer) is a pose-free approach to novel view trained on RGB images alone.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos [82.74918564737591]
We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input.
Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches.
arXiv Detail & Related papers (2022-10-22T04:57:55Z) - One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics.
Each object in the scene is depicted by a latent code capturing geometry and appearance.
This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - STaR: Self-supervised Tracking and Reconstruction of Rigid Objects in
Motion with Neural Rendering [9.600908665766465]
We present STaR, a novel method that performs Self-supervised Tracking and Reconstruction of dynamic scenes with rigid motion from multi-view RGB videos without any manual annotation.
We show that our method can render photorealistic novel views, where novelty is measured on both spatial and temporal axes.
arXiv Detail & Related papers (2020-12-22T23:45:28Z) - Non-Rigid Neural Radiance Fields: Reconstruction and Novel View
Synthesis of a Dynamic Scene From Monocular Video [76.19076002661157]
Non-Rigid Neural Radiance Fields (NR-NeRF) is a reconstruction and novel view synthesis approach for general non-rigid dynamic scenes.
We show that even a single consumer-grade camera is sufficient to synthesize sophisticated renderings of a dynamic scene from novel virtual camera views.
arXiv Detail & Related papers (2020-12-22T18:46:12Z) - Neural Scene Graphs for Dynamic Scenes [57.65413768984925]
We present the first neural rendering method that decomposes dynamic scenes into scene graphs.
We learn implicitly encoded scenes, combined with a jointly learned latent representation to describe objects with a single implicit function.
arXiv Detail & Related papers (2020-11-20T12:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.