Related papers: Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

URL: http://arxiv.org/abs/2503.20785v1
Date: Wed, 26 Mar 2025 17:59:44 GMT
Title: Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
Authors: Tianqi Liu, Zihao Huang, Zhaoxi Chen, Guangcong Wang, Shoukang Hu, Liao Shen, Huiqiang Sun, Zhiguo Cao, Wei Li, Ziwei Liu,
Abstract summary: Free4D is a tuning-free framework for 4D scene generation from a single image.<n>Our key insight is to distill pre-trained foundation models for consistent 4D scene representation.<n>The resulting 4D representation enables real-time, controllable rendering.
Score: 49.875459658889355
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present Free4D, a novel tuning-free framework for 4D scene generation from a single image. Existing methods either focus on object-level generation, making scene-level generation infeasible, or rely on large-scale multi-view video datasets for expensive training, with limited generalization ability due to the scarcity of 4D scene data. In contrast, our key insight is to distill pre-trained foundation models for consistent 4D scene representation, which offers promising advantages such as efficiency and generalizability. 1) To achieve this, we first animate the input image using image-to-video diffusion models followed by 4D geometric structure initialization. 2) To turn this coarse structure into spatial-temporal consistent multiview videos, we design an adaptive guidance mechanism with a point-guided denoising strategy for spatial consistency and a novel latent replacement strategy for temporal coherence. 3) To lift these generated observations into consistent 4D representation, we propose a modulation-based refinement to mitigate inconsistencies while fully leveraging the generated information. The resulting 4D representation enables real-time, controllable rendering, marking a significant advancement in single-image-based 4D scene generation.

Related papers

Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z)
ShapeGen4D: Towards High Quality 4D Shape Generation from Videos [85.45517487721257]
We introduce a native video-to-4D shape generation framework that synthesizes a single dynamic 3D representation end-to-end from the video.<n>Our method accurately captures non-rigid motion, volume changes, and even topological transitions without per-frame optimization.
arXiv Detail & Related papers (2025-10-07T17:58:11Z)
4DNeX: Feed-Forward 4D Generative Modeling Made Easy [51.79072580042173]
We present 4DNeX, the first feed-forward framework for generating 4D (i.e., dynamic 3D) scene representations from a single image.<n>In contrast to existing methods that rely on computationally intensive optimization or require multi-frame video inputs, 4DNeX enables efficient, end-to-end image-to-4D generation.
arXiv Detail & Related papers (2025-08-18T17:59:55Z)
Dream4D: Lifting Camera-Controlled I2V towards Spatiotemporally Consistent 4D Generation [3.1852855132066673]
Current approaches often struggle to maintain view consistency while handling complex scene dynamics.<n>This framework is the first to leverage both rich temporal priors video diffusion models and geometric awareness of the reconstruction models.<n>It significantly facilitates 4D generation and shows higher quality (e.g., mPSNR, mSSIM) over existing methods.
arXiv Detail & Related papers (2025-08-11T08:55:47Z)
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation [66.20991603309054]
We propose the first framework capable of computing a 4D-temporal grid of video frames and 3D Gaussian particles for each time step using a feed-forward architecture.<n>In the first part, we analyze current 4D video diffusion architectures that perform spatial and temporal attention either sequentially or in parallel within a two-stream design.<n>In the second part, we extend existing 3D reconstruction algorithms by introducing a Gaussian head, a camera token replacement algorithm, and additional dynamic layers and training.
arXiv Detail & Related papers (2025-06-18T23:44:59Z)
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene [122.42861221739123]
This paper investigates a novel framework for 4D-PSG generation that leverages rich 2D visual scene annotations to enhance 4D scene learning. We propose a 2D-to-4D visual scene transfer learning framework, where a spatial-temporal scene strategy effectively transfers dimension-invariant features from abundant 2D SG annotations to 4D scenes.
arXiv Detail & Related papers (2025-03-19T09:16:08Z)
4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives [116.2042238179433]
In this paper, we frame dynamic scenes as unconstrained 4D volume learning problems. We represent a target dynamic scene using a collection of 4D Gaussian primitives with explicit geometry and appearance features. This approach can capture relevant information in space and time by fitting the underlying photorealistic-temporal volume. Notably, our 4DGS model is the first solution that supports real-time rendering of high-resolution, novel views for complex dynamic scenes.
arXiv Detail & Related papers (2024-12-30T05:30:26Z)
Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models [116.31344506738816]
We present a novel framework, textbfDiffusion4D, for efficient and scalable 4D content generation. We develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets. Our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency.
arXiv Detail & Related papers (2024-05-26T17:47:34Z)
Comp4D: LLM-Guided Compositional 4D Scene Generation [65.5810466788355]
We present Comp4D, a novel framework for Compositional 4D Generation. Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately. Our method employs a compositional score distillation technique guided by the pre-defined trajectories.
arXiv Detail & Related papers (2024-03-25T17:55:52Z)
STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians [36.83603109001298]
STAG4D is a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation. We show that our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness.
arXiv Detail & Related papers (2024-03-22T04:16:33Z)
4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
We present textbf4DGen, a novel framework for grounded 4D content creation.<n>Our pipeline facilitates controllable 4D generation, enabling users to specify the motion via monocular video or adopt image-to-video generations.<n>Compared to existing video-to-4D baselines, our approach yields superior results in faithfully reconstructing input signals.
arXiv Detail & Related papers (2023-12-28T18:53:39Z)
Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting [8.078460597825142]
Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics. We propose to approximate the underlying-temporal rendering volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling. Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics.
arXiv Detail & Related papers (2023-10-16T17:57:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.