Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
- URL: http://arxiv.org/abs/2503.20785v1
- Date: Wed, 26 Mar 2025 17:59:44 GMT
- Title: Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
- Authors: Tianqi Liu, Zihao Huang, Zhaoxi Chen, Guangcong Wang, Shoukang Hu, Liao Shen, Huiqiang Sun, Zhiguo Cao, Wei Li, Ziwei Liu,
- Abstract summary: Free4D is a tuning-free framework for 4D scene generation from a single image.<n>Our key insight is to distill pre-trained foundation models for consistent 4D scene representation.<n>The resulting 4D representation enables real-time, controllable rendering.
- Score: 49.875459658889355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Free4D, a novel tuning-free framework for 4D scene generation from a single image. Existing methods either focus on object-level generation, making scene-level generation infeasible, or rely on large-scale multi-view video datasets for expensive training, with limited generalization ability due to the scarcity of 4D scene data. In contrast, our key insight is to distill pre-trained foundation models for consistent 4D scene representation, which offers promising advantages such as efficiency and generalizability. 1) To achieve this, we first animate the input image using image-to-video diffusion models followed by 4D geometric structure initialization. 2) To turn this coarse structure into spatial-temporal consistent multiview videos, we design an adaptive guidance mechanism with a point-guided denoising strategy for spatial consistency and a novel latent replacement strategy for temporal coherence. 3) To lift these generated observations into consistent 4D representation, we propose a modulation-based refinement to mitigate inconsistencies while fully leveraging the generated information. The resulting 4D representation enables real-time, controllable rendering, marking a significant advancement in single-image-based 4D scene generation.
Related papers
- Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene [122.42861221739123]
This paper investigates a novel framework for 4D-PSG generation that leverages rich 2D visual scene annotations to enhance 4D scene learning.
We propose a 2D-to-4D visual scene transfer learning framework, where a spatial-temporal scene strategy effectively transfers dimension-invariant features from abundant 2D SG annotations to 4D scenes.
arXiv Detail & Related papers (2025-03-19T09:16:08Z) - 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives [116.2042238179433]
In this paper, we frame dynamic scenes as unconstrained 4D volume learning problems.
We represent a target dynamic scene using a collection of 4D Gaussian primitives with explicit geometry and appearance features.
This approach can capture relevant information in space and time by fitting the underlying photorealistic-temporal volume.
Notably, our 4DGS model is the first solution that supports real-time rendering of high-resolution, novel views for complex dynamic scenes.
arXiv Detail & Related papers (2024-12-30T05:30:26Z) - Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models [116.31344506738816]
We present a novel framework, textbfDiffusion4D, for efficient and scalable 4D content generation.
We develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets.
Our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency.
arXiv Detail & Related papers (2024-05-26T17:47:34Z) - Comp4D: LLM-Guided Compositional 4D Scene Generation [65.5810466788355]
We present Comp4D, a novel framework for Compositional 4D Generation.
Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately.
Our method employs a compositional score distillation technique guided by the pre-defined trajectories.
arXiv Detail & Related papers (2024-03-25T17:55:52Z) - STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians [36.83603109001298]
STAG4D is a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation.
We show that our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness.
arXiv Detail & Related papers (2024-03-22T04:16:33Z) - 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
We present textbf4DGen, a novel framework for grounded 4D content creation.<n>Our pipeline facilitates controllable 4D generation, enabling users to specify the motion via monocular video or adopt image-to-video generations.<n>Compared to existing video-to-4D baselines, our approach yields superior results in faithfully reconstructing input signals.
arXiv Detail & Related papers (2023-12-28T18:53:39Z) - Real-time Photorealistic Dynamic Scene Representation and Rendering with
4D Gaussian Splatting [8.078460597825142]
Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics.
We propose to approximate the underlying-temporal rendering volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling.
Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics.
arXiv Detail & Related papers (2023-10-16T17:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.