NVFi: Neural Velocity Fields for 3D Physics Learning from Dynamic Videos
- URL: http://arxiv.org/abs/2312.06398v1
- Date: Mon, 11 Dec 2023 14:07:31 GMT
- Title: NVFi: Neural Velocity Fields for 3D Physics Learning from Dynamic Videos
- Authors: Jinxi Li, Ziyang Song, Bo Yang
- Abstract summary: We propose to simultaneously learn the geometry, appearance, and physical velocity of 3D scenes only from video frames.
We conduct extensive experiments on multiple datasets, demonstrating the superior performance of our method over all baselines.
- Score: 8.559809421797784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we aim to model 3D scene dynamics from multi-view videos.
Unlike the majority of existing works which usually focus on the common task of
novel view synthesis within the training time period, we propose to
simultaneously learn the geometry, appearance, and physical velocity of 3D
scenes only from video frames, such that multiple desirable applications can be
supported, including future frame extrapolation, unsupervised 3D semantic scene
decomposition, and dynamic motion transfer. Our method consists of three major
components, 1) the keyframe dynamic radiance field, 2) the interframe velocity
field, and 3) a joint keyframe and interframe optimization module which is the
core of our framework to effectively train both networks. To validate our
method, we further introduce two dynamic 3D datasets: 1) Dynamic Object
dataset, and 2) Dynamic Indoor Scene dataset. We conduct extensive experiments
on multiple datasets, demonstrating the superior performance of our method over
all baselines, particularly in the critical tasks of future frame extrapolation
and unsupervised 3D semantic scene decomposition.
Related papers
- MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.
By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.
We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.
voxelization infers per-object occupancy probabilities at individual spatial locations.
Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - OD-NeRF: Efficient Training of On-the-Fly Dynamic Neural Radiance Fields [63.04781030984006]
Dynamic neural radiance fields (dynamic NeRFs) have demonstrated impressive results in novel view synthesis on 3D dynamic scenes.
We propose OD-NeRF to efficiently train and render dynamic NeRFs on-the-fly which instead is capable of streaming the dynamic scene.
Our algorithm can achieve an interactive speed of 6FPS training and rendering on synthetic dynamic scenes on-the-fly, and a significant speed-up compared to the state-of-the-art on real-world dynamic scenes.
arXiv Detail & Related papers (2023-05-24T07:36:47Z) - SUDS: Scalable Urban Dynamic Scenes [46.965165390077146]
We extend neural radiance fields (NeRFs) to dynamic large-scale urban scenes.
We factorize the scene into three separate hash table data structures to efficiently encode static, dynamic, and far-field radiance fields.
Our reconstructions can be scaled to tens of thousands of objects across 1.2 million frames from 1700 videos spanning geospatial footprints of hundreds of kilometers.
arXiv Detail & Related papers (2023-03-25T18:55:09Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.