PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing
- URL: http://arxiv.org/abs/2503.11044v3
- Date: Tue, 01 Apr 2025 02:01:21 GMT
- Title: PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing
- Authors: Hasan Iqbal, Nazmul Karim, Umar Khalid, Azib Farooq, Zichun Zhong, Chen Chen, Jing Hua,
- Abstract summary: We introduce a progressive sampling framework for 4D editing (PSF-4D)<n>For temporal coherence, we design a correlated Gaussian noise structure that links frames over time.<n>For spatial consistency across views, we implement a cross-view noise model.<n>Our approach enables high-quality 4D editing without relying on external models.
- Score: 10.331089974537873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instruction-guided generative models, especially those using text-to-image (T2I) and text-to-video (T2V) diffusion frameworks, have advanced the field of content editing in recent years. To extend these capabilities to 4D scene, we introduce a progressive sampling framework for 4D editing (PSF-4D) that ensures temporal and multi-view consistency by intuitively controlling the noise initialization during forward diffusion. For temporal coherence, we design a correlated Gaussian noise structure that links frames over time, allowing each frame to depend meaningfully on prior frames. Additionally, to ensure spatial consistency across views, we implement a cross-view noise model, which uses shared and independent noise components to balance commonalities and distinct details among different views. To further enhance spatial coherence, PSF-4D incorporates view-consistent iterative refinement, embedding view-aware information into the denoising process to ensure aligned edits across frames and views. Our approach enables high-quality 4D editing without relying on external models, addressing key challenges in previous methods. Through extensive evaluation on multiple benchmarks and multiple editing aspects (e.g., style transfer, multi-attribute editing, object removal, local editing, etc.), we show the effectiveness of our proposed method. Experimental results demonstrate that our proposed method outperforms state-of-the-art 4D editing methods in diverse benchmarks.
Related papers
- Visual Prompting for One-shot Controllable Video Editing without Inversion [24.49929851970489]
One-shot controllable video editing is an important yet challenging task.
Prior methods employ DDIM inversion to transform source frames into latent noise.
We propose a content consistency sampling (CCS) to ensure consistency between the generated edited frames and the source frames.
arXiv Detail & Related papers (2025-04-19T16:00:47Z) - In-2-4D: Inbetweening from Two Single-View Images to 4D Generation [54.62824686338408]
We propose a new problem, In-between2-4D, for generative 4D (i.e., 3D + motion) in Splating from a minimalistic input setting.
Given two images representing the start and end states of an object in motion, our goal is to generate and reconstruct the motion in 4D.
arXiv Detail & Related papers (2025-04-11T09:01:09Z) - Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model [52.0192865857058]
We propose the first training-free 4D video generation method that leverages the off-the-shelf video diffusion models to generate multi-view videos from a single input video.
Our method is training-free and fully utilizes an off-the-shelf video diffusion model, offering a practical and effective solution for multi-view video generation.
arXiv Detail & Related papers (2025-03-28T17:14:48Z) - Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image.
Our key insight is to distill pre-trained foundation models for consistent 4D scene representation.
The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z) - CT4D: Consistent Text-to-4D Generation with Animatable Meshes [53.897244823604346]
We present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user-supplied prompts.
Our framework incorporates a unique Generate-Refine-Animate (GRA) algorithm to enhance the creation of text-aligned meshes.
Our experimental results, both qualitative and quantitative, demonstrate that our CT4D framework surpasses existing text-to-4D techniques in maintaining interframe consistency and preserving global geometry.
arXiv Detail & Related papers (2024-08-15T14:41:34Z) - SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing [58.22339174221563]
We propose SyncNoise, a novel geometry-guided multi-view consistent noise editing approach for high-fidelity 3D scene editing.
SyncNoise synchronously edits multiple views with 2D diffusion models while enforcing multi-view noise predictions to be geometrically consistent.
Our method achieves high-quality 3D editing results respecting the textual instructions, especially in scenes with complex textures.
arXiv Detail & Related papers (2024-06-25T09:17:35Z) - Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion [30.331519274430594]
Instruct 4D-to-4D generates high-quality instruction-guided dynamic scene editing results.
We treat a 4D scene as a pseudo-3D scene, decoupled into two sub-problems: achieving temporal consistency in video editing and applying these edits to the pseudo-3D scene.
We extensively evaluate our approach in various scenes and editing instructions, and demonstrate that it achieves spatially and temporally consistent editing results.
arXiv Detail & Related papers (2024-06-13T17:59:30Z) - EG4D: Explicit Generation of 4D Object without Score Distillation [105.63506584772331]
DG4D is a novel framework that generates high-quality and consistent 4D assets without score distillation.
Our framework outperforms the baselines in generation quality by a considerable margin.
arXiv Detail & Related papers (2024-05-28T12:47:22Z) - Comp4D: LLM-Guided Compositional 4D Scene Generation [65.5810466788355]
We present Comp4D, a novel framework for Compositional 4D Generation.
Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately.
Our method employs a compositional score distillation technique guided by the pre-defined trajectories.
arXiv Detail & Related papers (2024-03-25T17:55:52Z) - Consolidating Attention Features for Multi-view Image Editing [126.19731971010475]
We focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views.
We introduce QNeRF, a neural radiance field trained on the internal query features of the edited images.
We refine the process through a progressive, iterative method that better consolidates queries across the diffusion timesteps.
arXiv Detail & Related papers (2024-02-22T18:50:18Z) - 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
We present textbf4DGen, a novel framework for grounded 4D content creation.<n>Our pipeline facilitates controllable 4D generation, enabling users to specify the motion via monocular video or adopt image-to-video generations.<n>Compared to existing video-to-4D baselines, our approach yields superior results in faithfully reconstructing input signals.
arXiv Detail & Related papers (2023-12-28T18:53:39Z) - 4D Association Graph for Realtime Multi-person Motion Capture Using
Multiple Video Cameras [46.664422061537564]
This paper contributes a novel realtime multi-person motion capture algorithm using multiview video inputs.
We unify per-view parsing, cross-view matching, and temporal tracking into a single optimization framework.
Our method is robust to noisy detection, and achieves high-quality online pose reconstruction quality.
arXiv Detail & Related papers (2020-02-28T09:57:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.