Class-agnostic Reconstruction of Dynamic Objects from Videos
- URL: http://arxiv.org/abs/2112.02091v1
- Date: Fri, 3 Dec 2021 18:57:47 GMT
- Title: Class-agnostic Reconstruction of Dynamic Objects from Videos
- Authors: Zhongzheng Ren, Xiaoming Zhao, Alexander G. Schwing
- Abstract summary: We introduce REDO, a class-agnostic framework to REconstruct the Dynamic Objects from RGBD or calibrated videos.
We develop two novel modules. First, we introduce a canonical 4D implicit function which is pixel-aligned with aggregated temporal visual cues.
Second, we develop a 4D transformation module which captures object dynamics to support temporal propagation and aggregation.
- Score: 127.41336060616214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce REDO, a class-agnostic framework to REconstruct the Dynamic
Objects from RGBD or calibrated videos. Compared to prior work, our problem
setting is more realistic yet more challenging for three reasons: 1) due to
occlusion or camera settings an object of interest may never be entirely
visible, but we aim to reconstruct the complete shape; 2) we aim to handle
different object dynamics including rigid motion, non-rigid motion, and
articulation; 3) we aim to reconstruct different categories of objects with one
unified framework. To address these challenges, we develop two novel modules.
First, we introduce a canonical 4D implicit function which is pixel-aligned
with aggregated temporal visual cues. Second, we develop a 4D transformation
module which captures object dynamics to support temporal propagation and
aggregation. We study the efficacy of REDO in extensive experiments on
synthetic RGBD video datasets SAIL-VOS 3D and DeformingThings4D++, and on
real-world video data 3DPW. We find REDO outperforms state-of-the-art dynamic
reconstruction methods by a margin. In ablation studies we validate each
developed component.
Related papers
- LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation [32.27869897947267]
We introduce LEIA, a novel approach for representing dynamic 3D objects.
Our method involves observing the object at distinct time steps or "states" and conditioning a hypernetwork on the current state.
By interpolating between these states, we can generate novel articulation configurations in 3D space that were previously unseen.
arXiv Detail & Related papers (2024-09-10T17:59:53Z) - Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.
voxelization infers per-object occupancy probabilities at individual spatial locations.
Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos [21.93514516437402]
We present DreamScene4D, the first approach to generate 3D dynamic scenes of multiple objects from monocular videos via novel view synthesis.
Our key insight is a "decompose-recompose" approach that factorizes the video scene into the background and object tracks.
We show extensive results on challenging DAVIS, Kubric, and self-captured videos with quantitative comparisons and a user preference study.
arXiv Detail & Related papers (2024-05-03T17:55:34Z) - REACTO: Reconstructing Articulated Objects from a Single Video [64.89760223391573]
We propose a novel deformation model that enhances the rigidity of each part while maintaining flexible deformation of the joints.
Our method outperforms previous works in producing higher-fidelity 3D reconstructions of general articulated objects.
arXiv Detail & Related papers (2024-04-17T08:01:55Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.