Consistent4D: Consistent 360{\deg} Dynamic Object Generation from
Monocular Video
- URL: http://arxiv.org/abs/2311.02848v1
- Date: Mon, 6 Nov 2023 03:26:43 GMT
- Title: Consistent4D: Consistent 360{\deg} Dynamic Object Generation from
Monocular Video
- Authors: Yanqin Jiang, Li Zhang, Jin Gao, Weimin Hu, Yao Yao
- Abstract summary: Consistent4D is a novel approach for generating 4D dynamic objects from uncalibrated monocular videos.
We cast the 360-degree dynamic object reconstruction as a 4D generation problem, eliminating the need for tedious multi-view data collection and camera calibration.
- Score: 15.621374353364468
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present Consistent4D, a novel approach for generating 4D
dynamic objects from uncalibrated monocular videos. Uniquely, we cast the
360-degree dynamic object reconstruction as a 4D generation problem,
eliminating the need for tedious multi-view data collection and camera
calibration. This is achieved by leveraging the object-level 3D-aware image
diffusion model as the primary supervision signal for training Dynamic Neural
Radiance Fields (DyNeRF). Specifically, we propose a Cascade DyNeRF to
facilitate stable convergence and temporal continuity under the supervision
signal which is discrete along the time axis. To achieve spatial and temporal
consistency, we further introduce an Interpolation-driven Consistency Loss. It
is optimized by minimizing the discrepancy between rendered frames from DyNeRF
and interpolated frames from a pre-trained video interpolation model. Extensive
experiments show that our Consistent4D can perform competitively to prior art
alternatives, opening up new possibilities for 4D dynamic object generation
from monocular videos, whilst also demonstrating advantage for conventional
text-to-3D generation tasks. Our project page is
https://consistent4d.github.io/.
Related papers
- 4Diffusion: Multi-view Video Diffusion Model for 4D Generation [55.82208863521353]
Current 4D generation methods have achieved noteworthy efficacy with the aid of advanced diffusion generative models.
We propose a novel 4D generation pipeline, namely 4Diffusion, aimed at generating spatial-temporally consistent 4D content from a monocular video.
arXiv Detail & Related papers (2024-05-31T08:18:39Z) - EG4D: Explicit Generation of 4D Object without Score Distillation [105.63506584772331]
DG4D is a novel framework that generates high-quality and consistent 4D assets without score distillation.
Our framework outperforms the baselines in generation quality by a considerable margin.
arXiv Detail & Related papers (2024-05-28T12:47:22Z) - Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models [116.31344506738816]
We present a novel framework, textbfDiffusion4D, for efficient and scalable 4D content generation.
We develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets.
Our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency.
arXiv Detail & Related papers (2024-05-26T17:47:34Z) - SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer [57.506654943449796]
We propose an efficient, sparse-controlled video-to-4D framework named SC4D that decouples motion and appearance.
Our method surpasses existing methods in both quality and efficiency.
We devise a novel application that seamlessly transfers motion onto a diverse array of 4D entities.
arXiv Detail & Related papers (2024-04-04T18:05:18Z) - Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models [6.738732514502613]
Diffusion$2$ is a novel framework for dynamic 3D content creation.
It reconciles the knowledge about geometric consistency and temporal smoothness from 3D models to directly sample dense multi-view images.
Experiments demonstrate the efficacy of our proposed framework in generating highly seamless and consistent 4D assets.
arXiv Detail & Related papers (2024-04-02T17:58:03Z) - STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians [36.83603109001298]
STAG4D is a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation.
We show that our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness.
arXiv Detail & Related papers (2024-03-22T04:16:33Z) - Efficient4D: Fast Dynamic 3D Object Generation from a Single-view Video [42.10482273572879]
We propose an efficient video-to-4D object generation framework called Efficient4D.
It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data.
Experiments on both synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed.
arXiv Detail & Related papers (2024-01-16T18:58:36Z) - 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
This work introduces 4DGen, a novel framework for grounded 4D content creation.
We identify static 3D assets and monocular video sequences as key components in constructing the 4D content.
Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos)
arXiv Detail & Related papers (2023-12-28T18:53:39Z) - Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed
Diffusion Models [94.07744207257653]
We focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects.
We combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization.
arXiv Detail & Related papers (2023-12-21T11:41:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.