Related papers: 4Dynamic: Text-to-4D Generation with Hybrid Priors

4Dynamic: Text-to-4D Generation with Hybrid Priors

URL: http://arxiv.org/abs/2407.12684v1
Date: Wed, 17 Jul 2024 16:02:55 GMT
Title: 4Dynamic: Text-to-4D Generation with Hybrid Priors
Authors: Yu-Jie Yuan, Leif Kobbelt, Jiwen Liu, Yuan Zhang, Pengfei Wan, Yu-Kun Lai, Lin Gao,
Abstract summary: We propose a novel method for text-to-4D generation, which ensures the dynamic amplitude and authenticity through direct supervision provided by a video prior. Our method not only supports text-to-4D generation but also enables 4D generation from monocular videos.
Score: 56.918589589853184
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to the fascinating generative performance of text-to-image diffusion models, growing text-to-3D generation works explore distilling the 2D generative priors into 3D, using the score distillation sampling (SDS) loss, to bypass the data scarcity problem. The existing text-to-3D methods have achieved promising results in realism and 3D consistency, but text-to-4D generation still faces challenges, including lack of realism and insufficient dynamic motions. In this paper, we propose a novel method for text-to-4D generation, which ensures the dynamic amplitude and authenticity through direct supervision provided by a video prior. Specifically, we adopt a text-to-video diffusion model to generate a reference video and divide 4D generation into two stages: static generation and dynamic generation. The static 3D generation is achieved under the guidance of the input text and the first frame of the reference video, while in the dynamic generation stage, we introduce a customized SDS loss to ensure multi-view consistency, a video-based SDS loss to improve temporal consistency, and most importantly, direct priors from the reference video to ensure the quality of geometry and texture. Moreover, we design a prior-switching training strategy to avoid conflicts between different priors and fully leverage the benefits of each prior. In addition, to enrich the generated motion, we further introduce a dynamic modeling representation composed of a deformation network and a topology network, which ensures dynamic continuity while modeling topological changes. Our method not only supports text-to-4D generation but also enables 4D generation from monocular videos. The comparison experiments demonstrate the superiority of our method compared to existing methods.

Related papers

TextMesh4D: High-Quality Text-to-4D Mesh Generation [13.069414103080447]
We introduce TextMesh4D, a novel framework for high-quality text-to-4D generation.<n>Our approach leverages per-face Jacobians as a differentiable mesh representation and decomposes 4D generation into two stages: static object creation and dynamic motion synthesis.<n> Experiments demonstrate that TextMesh4D state-of-the-art results in terms of temporal consistency, structural fidelity, and visual realism.
arXiv Detail & Related papers (2025-06-30T17:58:34Z)
AR4D: Autoregressive 4D Generation from Monocular Videos [27.61057927559143]
Existing approaches primarily rely on Score Distillation Sampling to infer novel-view videos. We propose AR4D, a novel paradigm for SDS-free 4D generation. We show that AR4D can achieve state-of-the-art 4D generation without SDS, delivering greater diversity, improved spatial-temporal consistency, and better alignment with input prompts.
arXiv Detail & Related papers (2025-01-03T09:27:36Z)
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency [37.96042037188354]
We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation.
arXiv Detail & Related papers (2024-07-24T17:59:43Z)
Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models [116.31344506738816]
We present a novel framework, textbfDiffusion4D, for efficient and scalable 4D content generation. We develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets. Our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency.
arXiv Detail & Related papers (2024-05-26T17:47:34Z)
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer [57.506654943449796]
We propose an efficient, sparse-controlled video-to-4D framework named SC4D that decouples motion and appearance. Our method surpasses existing methods in both quality and efficiency. We devise a novel application that seamlessly transfers motion onto a diverse array of 4D entities.
arXiv Detail & Related papers (2024-04-04T18:05:18Z)
4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
We present textbf4DGen, a novel framework for grounded 4D content creation. Our pipeline facilitates controllable 4D generation, enabling users to specify the motion via monocular video or adopt image-to-video generations. Compared to existing video-to-4D baselines, our approach yields superior results in faithfully reconstructing input signals.
arXiv Detail & Related papers (2023-12-28T18:53:39Z)
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models [94.07744207257653]
We focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects. We combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization.
arXiv Detail & Related papers (2023-12-21T11:41:02Z)
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling [91.99172731031206]
Current text-to-4D methods face a three-way tradeoff between quality of scene appearance, 3D structure, and motion. We introduce hybrid score distillation sampling, an alternating optimization procedure that blends supervision signals from multiple pre-trained diffusion models.
arXiv Detail & Related papers (2023-11-29T18:58:05Z)
A Unified Approach for Text- and Image-guided 4D Scene Generation [58.658768832653834]
We propose Dream-in-4D, which features a novel two-stage approach for text-to-4D synthesis. We show that our approach significantly advances image and motion quality, 3D consistency and text fidelity for text-to-4D generation. Our method offers, for the first time, a unified approach for text-to-4D, image-to-4D and personalized 4D generation tasks.
arXiv Detail & Related papers (2023-11-28T15:03:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.