Related papers: Factored Neural Representation for Scene Understanding

Factored Neural Representation for Scene Understanding

URL: http://arxiv.org/abs/2304.10950v3
Date: Wed, 21 Jun 2023 03:37:26 GMT
Title: Factored Neural Representation for Scene Understanding
Authors: Yu-Shiang Wong, Niloy J. Mitra
Abstract summary: We introduce a factored neural scene representation that can directly be learned from a monocular RGB-D video to produce object-level neural presentations. We evaluate ours against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable.
Score: 39.66967677639173
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A long-standing goal in scene understanding is to obtain interpretable and editable representations that can be directly constructed from a raw monocular RGB-D video, without requiring specialized hardware setup or priors. The problem is significantly more challenging in the presence of multiple moving and/or deforming objects. Traditional methods have approached the setup with a mix of simplifications, scene priors, pretrained templates, or known deformation models. The advent of neural representations, especially neural implicit representations and radiance fields, opens the possibility of end-to-end optimization to collectively capture geometry, appearance, and object motion. However, current approaches produce global scene encoding, assume multiview capture with limited or no motion in the scenes, and do not facilitate easy manipulation beyond novel view synthesis. In this work, we introduce a factored neural scene representation that can directly be learned from a monocular RGB-D video to produce object-level neural presentations with an explicit encoding of object movement (e.g., rigid trajectory) and/or deformations (e.g., nonrigid movement). We evaluate ours against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable (e.g., change object trajectory). Code and data are available at http://geometry.cs.ucl.ac.uk/projects/2023/factorednerf .

Related papers

MoST: Multi-modality Scene Tokenization for Motion Prediction [39.97334929667033]
We propose tokenizing the visual world into a compact set of scene elements. We then leverage pre-trained image foundation models and LiDAR neural networks to encode all the scene elements in an open-vocabulary manner. Our proposed representation can efficiently encode the multi-frame multi-modality observations with a few hundred tokens.
arXiv Detail & Related papers (2024-04-30T13:09:41Z)
RUST: Latent Neural Scene Representations from Unposed Imagery [21.433079925439234]
Inferring structure of 3D scenes from 2D observations is a fundamental challenge in computer vision. Recent popularized approaches based on neural scene representations have achieved tremendous impact. RUST (Really Unposed Scene representation Transformer) is a pose-free approach to novel view trained on RGB images alone.
arXiv Detail & Related papers (2022-11-25T18:59:10Z)
NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos [82.74918564737591]
We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches.
arXiv Detail & Related papers (2022-10-22T04:57:55Z)
One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics. Each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z)
Neural Groundplans: Persistent Neural Scene Representations from a Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation. We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z)
STaR: Self-supervised Tracking and Reconstruction of Rigid Objects in Motion with Neural Rendering [9.600908665766465]
We present STaR, a novel method that performs Self-supervised Tracking and Reconstruction of dynamic scenes with rigid motion from multi-view RGB videos without any manual annotation. We show that our method can render photorealistic novel views, where novelty is measured on both spatial and temporal axes.
arXiv Detail & Related papers (2020-12-22T23:45:28Z)
Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video [76.19076002661157]
Non-Rigid Neural Radiance Fields (NR-NeRF) is a reconstruction and novel view synthesis approach for general non-rigid dynamic scenes. We show that even a single consumer-grade camera is sufficient to synthesize sophisticated renderings of a dynamic scene from novel virtual camera views.
arXiv Detail & Related papers (2020-12-22T18:46:12Z)
Neural Scene Graphs for Dynamic Scenes [57.65413768984925]
We present the first neural rendering method that decomposes dynamic scenes into scene graphs. We learn implicitly encoded scenes, combined with a jointly learned latent representation to describe objects with a single implicit function.
arXiv Detail & Related papers (2020-11-20T12:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.