HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
- URL: http://arxiv.org/abs/2504.21650v1
- Date: Wed, 30 Apr 2025 13:55:28 GMT
- Title: HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
- Authors: Haiyang Zhou, Wangbo Yu, Jiawen Guan, Xinhua Cheng, Yonghong Tian, Li Yuan,
- Abstract summary: HoloTime is a framework that integrates video diffusion models to generate panoramic videos from a single prompt or reference image.<n>360World dataset is the first comprehensive collection of panoramic videos suitable for downstream 4D scene reconstruction tasks.<n>Panoramic Animator is a two-stage image-to-video diffusion model that can convert panoramic images into high-quality panoramic videos.<n>Panoramic Space-Time Reconstruction uses a space-time depth estimation method to transform the generated panoramic videos into 4D point clouds.
- Score: 29.579493980120173
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid advancement of diffusion models holds the promise of revolutionizing the application of VR and AR technologies, which typically require scene-level 4D assets for user experience. Nonetheless, existing diffusion models predominantly concentrate on modeling static 3D scenes or object-level dynamics, constraining their capacity to provide truly immersive experiences. To address this issue, we propose HoloTime, a framework that integrates video diffusion models to generate panoramic videos from a single prompt or reference image, along with a 360-degree 4D scene reconstruction method that seamlessly transforms the generated panoramic video into 4D assets, enabling a fully immersive 4D experience for users. Specifically, to tame video diffusion models for generating high-fidelity panoramic videos, we introduce the 360World dataset, the first comprehensive collection of panoramic videos suitable for downstream 4D scene reconstruction tasks. With this curated dataset, we propose Panoramic Animator, a two-stage image-to-video diffusion model that can convert panoramic images into high-quality panoramic videos. Following this, we present Panoramic Space-Time Reconstruction, which leverages a space-time depth estimation method to transform the generated panoramic videos into 4D point clouds, enabling the optimization of a holistic 4D Gaussian Splatting representation to reconstruct spatially and temporally consistent 4D scenes. To validate the efficacy of our method, we conducted a comparative analysis with existing approaches, revealing its superiority in both panoramic video generation and 4D scene reconstruction. This demonstrates our method's capability to create more engaging and realistic immersive environments, thereby enhancing user experiences in VR and AR applications.
Related papers
- Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization [31.956858341885436]
Video4DGen is a novel framework that excels in generating 4D representations from single or multiple generated videos.<n>Video4DGen offers a powerful tool for applications in virtual reality, animation, and beyond.
arXiv Detail & Related papers (2025-04-05T12:13:05Z) - Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model [52.0192865857058]
We propose the first training-free 4D video generation method that leverages the off-the-shelf video diffusion models to generate multi-view videos from a single input video.<n>Our method is training-free and fully utilizes an off-the-shelf video diffusion model, offering a practical and effective solution for multi-view video generation.
arXiv Detail & Related papers (2025-03-28T17:14:48Z) - Can Video Diffusion Model Reconstruct 4D Geometry? [66.5454886982702]
Sora3R is a novel framework that taps into richtemporals of large dynamic video diffusion models to infer 4D pointmaps from casual videos.
Experiments demonstrate that Sora3R reliably recovers both camera poses and detailed scene geometry, achieving performance on par with state-of-the-art methods for dynamic 4D reconstruction.
arXiv Detail & Related papers (2025-03-27T01:44:46Z) - Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image.<n>Our key insight is to distill pre-trained foundation models for consistent 4D scene representation.<n>The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z) - Wonderland: Navigating 3D Scenes from a Single Image [43.99037613068823]
We introduce a large-scale reconstruction model that leverages latents from a video diffusion model to predict 3D Gaussian Splattings of scenes in a feed-forward manner.<n>We train the 3D reconstruction model to operate on the video latent space with a progressive learning strategy, enabling the efficient generation of high-quality, wide-scope, and generic 3D scenes.
arXiv Detail & Related papers (2024-12-16T18:58:17Z) - CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models [98.03734318657848]
We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video.
We leverage a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis.
We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks.
arXiv Detail & Related papers (2024-11-27T18:57:16Z) - DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion [22.11178016375823]
DimensionX is a framework designed to generate 3D and 4D scenes from just a single image with video diffusion.
Our approach begins with the insight that both the spatial structure of a 3D scene and the temporal evolution of a 4D scene can be effectively represented through sequences of video frames.
arXiv Detail & Related papers (2024-11-07T18:07:31Z) - 4K4DGen: Panoramic 4D Generation at 4K Resolution [67.98105958108503]
We tackle the challenging task of elevating a single panorama to an immersive 4D experience.
For the first time, we demonstrate the capability to generate omnidirectional dynamic scenes with 360$circ$ views at 4K resolution.
We achieve high-quality Panorama-to-4D generation at a resolution of 4K for the first time.
arXiv Detail & Related papers (2024-06-19T13:11:02Z) - Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models [116.31344506738816]
We present a novel framework, textbfDiffusion4D, for efficient and scalable 4D content generation.
We develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets.
Our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency.
arXiv Detail & Related papers (2024-05-26T17:47:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.