AR4D: Autoregressive 4D Generation from Monocular Videos
- URL: http://arxiv.org/abs/2501.01722v1
- Date: Fri, 03 Jan 2025 09:27:36 GMT
- Title: AR4D: Autoregressive 4D Generation from Monocular Videos
- Authors: Hanxin Zhu, Tianyu He, Xiqian Yu, Junliang Guo, Zhibo Chen, Jiang Bian,
- Abstract summary: Existing approaches primarily rely on Score Distillation Sampling to infer novel-view videos.
We propose AR4D, a novel paradigm for SDS-free 4D generation.
We show that AR4D can achieve state-of-the-art 4D generation without SDS, delivering greater diversity, improved spatial-temporal consistency, and better alignment with input prompts.
- Score: 27.61057927559143
- License:
- Abstract: Recent advancements in generative models have ignited substantial interest in dynamic 3D content creation (\ie, 4D generation). Existing approaches primarily rely on Score Distillation Sampling (SDS) to infer novel-view videos, typically leading to issues such as limited diversity, spatial-temporal inconsistency and poor prompt alignment, due to the inherent randomness of SDS. To tackle these problems, we propose AR4D, a novel paradigm for SDS-free 4D generation. Specifically, our paradigm consists of three stages. To begin with, for a monocular video that is either generated or captured, we first utilize pre-trained expert models to create a 3D representation of the first frame, which is further fine-tuned to serve as the canonical space. Subsequently, motivated by the fact that videos happen naturally in an autoregressive manner, we propose to generate each frame's 3D representation based on its previous frame's representation, as this autoregressive generation manner can facilitate more accurate geometry and motion estimation. Meanwhile, to prevent overfitting during this process, we introduce a progressive view sampling strategy, utilizing priors from pre-trained large-scale 3D reconstruction models. To avoid appearance drift introduced by autoregressive generation, we further incorporate a refinement stage based on a global deformation field and the geometry of each frame's 3D representation. Extensive experiments have demonstrated that AR4D can achieve state-of-the-art 4D generation without SDS, delivering greater diversity, improved spatial-temporal consistency, and better alignment with input prompts.
Related papers
- 4Dynamic: Text-to-4D Generation with Hybrid Priors [56.918589589853184]
We propose a novel method for text-to-4D generation, which ensures the dynamic amplitude and authenticity through direct supervision provided by a video prior.
Our method not only supports text-to-4D generation but also enables 4D generation from monocular videos.
arXiv Detail & Related papers (2024-07-17T16:02:55Z) - 4Diffusion: Multi-view Video Diffusion Model for 4D Generation [55.82208863521353]
Current 4D generation methods have achieved noteworthy efficacy with the aid of advanced diffusion generative models.
We propose a novel 4D generation pipeline, namely 4Diffusion, aimed at generating spatial-temporally consistent 4D content from a monocular video.
arXiv Detail & Related papers (2024-05-31T08:18:39Z) - EG4D: Explicit Generation of 4D Object without Score Distillation [105.63506584772331]
DG4D is a novel framework that generates high-quality and consistent 4D assets without score distillation.
Our framework outperforms the baselines in generation quality by a considerable margin.
arXiv Detail & Related papers (2024-05-28T12:47:22Z) - Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models [116.31344506738816]
We present a novel framework, textbfDiffusion4D, for efficient and scalable 4D content generation.
We develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets.
Our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency.
arXiv Detail & Related papers (2024-05-26T17:47:34Z) - SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer [57.506654943449796]
We propose an efficient, sparse-controlled video-to-4D framework named SC4D that decouples motion and appearance.
Our method surpasses existing methods in both quality and efficiency.
We devise a novel application that seamlessly transfers motion onto a diverse array of 4D entities.
arXiv Detail & Related papers (2024-04-04T18:05:18Z) - STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians [36.83603109001298]
STAG4D is a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation.
We show that our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness.
arXiv Detail & Related papers (2024-03-22T04:16:33Z) - DreamGaussian4D: Generative 4D Gaussian Splatting [56.49043443452339]
We introduce DreamGaussian4D (DG4D), an efficient 4D generation framework that builds on Gaussian Splatting (GS)
Our key insight is that combining explicit modeling of spatial transformations with static GS makes an efficient and powerful representation for 4D generation.
Video generation methods have the potential to offer valuable spatial-temporal priors, enhancing the high-quality 4D generation.
arXiv Detail & Related papers (2023-12-28T17:16:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.