TiP4GEN: Text to Immersive Panorama 4D Scene Generation
- URL: http://arxiv.org/abs/2508.12415v2
- Date: Thu, 21 Aug 2025 17:28:57 GMT
- Title: TiP4GEN: Text to Immersive Panorama 4D Scene Generation
- Authors: Ke Xing, Hanwen Liang, Dejia Xu, Yuyang Yin, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei,
- Abstract summary: TiP4GEN is a text-to-dynamic panorama scene generation framework.<n>It enables fine-grained content control and synthesizes motion-rich, geometry-consistent panoramic 4D scenes.<n> TiP4GEN integrates panorama video generation and dynamic scene reconstruction to create 360-degree immersive virtual environments.
- Score: 82.8444414014506
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: With the rapid advancement and widespread adoption of VR/AR technologies, there is a growing demand for the creation of high-quality, immersive dynamic scenes. However, existing generation works predominantly concentrate on the creation of static scenes or narrow perspective-view dynamic scenes, falling short of delivering a truly 360-degree immersive experience from any viewpoint. In this paper, we introduce \textbf{TiP4GEN}, an advanced text-to-dynamic panorama scene generation framework that enables fine-grained content control and synthesizes motion-rich, geometry-consistent panoramic 4D scenes. TiP4GEN integrates panorama video generation and dynamic scene reconstruction to create 360-degree immersive virtual environments. For video generation, we introduce a \textbf{Dual-branch Generation Model} consisting of a panorama branch and a perspective branch, responsible for global and local view generation, respectively. A bidirectional cross-attention mechanism facilitates comprehensive information exchange between the branches. For scene reconstruction, we propose a \textbf{Geometry-aligned Reconstruction Model} based on 3D Gaussian Splatting. By aligning spatial-temporal point clouds using metric depth maps and initializing scene cameras with estimated poses, our method ensures geometric consistency and temporal coherence for the reconstructed scenes. Extensive experiments demonstrate the effectiveness of our proposed designs and the superiority of TiP4GEN in generating visually compelling and motion-coherent dynamic panoramic scenes. Our project page is at https://ke-xing.github.io/TiP4GEN/.
Related papers
- OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes [57.790894531046796]
Panorama-based 2D lifting has emerged as a promising technique to produce immersive, realistic, and diverse 3D environments.<n>In this work, we advance this technique to generate graphics-ready 3D scenes suitable for physically based rendering (PBR), relighting, and simulation.<n>Our key insight is to repurpose 2D generative models for panoramic perception of geometry, textures, and PBR materials.<n>Based on a lightweight and efficient cross-modal adapter structure, OmniX reuses 2D generative priors for a broad range of panoramic vision tasks.
arXiv Detail & Related papers (2025-10-30T17:59:51Z) - 4D Driving Scene Generation With Stereo Forcing [62.47705572424127]
Current generative models struggle to synthesize dynamic 4D driving scenes that simultaneously support temporal extrapolation and spatial novel view synthesis (NVS) without per-scene optimization.<n>We present PhiGenesis, a unified framework for 4D scene generation that extends video generation techniques with geometric and temporal consistency.
arXiv Detail & Related papers (2025-09-24T15:37:17Z) - HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation [29.579493980120173]
HoloTime is a framework that integrates video diffusion models to generate panoramic videos from a single prompt or reference image.<n>360World dataset is the first comprehensive collection of panoramic videos suitable for downstream 4D scene reconstruction tasks.<n>Panoramic Animator is a two-stage image-to-video diffusion model that can convert panoramic images into high-quality panoramic videos.<n>Panoramic Space-Time Reconstruction uses a space-time depth estimation method to transform the generated panoramic videos into 4D point clouds.
arXiv Detail & Related papers (2025-04-30T13:55:28Z) - Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration [18.23983135970619]
We propose a novel layered 3D scene reconstruction framework from panoramic image, named Scene4U.<n>Specifically, Scene4U integrates an open-vocabulary segmentation model with a large language model to decompose a real panorama into multiple layers.<n>We then employ a layered repair module based on diffusion model to restore occluded regions using visual cues and depth information, generating a hierarchical representation of the scene.<n>Scene4U outperforms state-of-the-art method, improving by 24.24% in LPIPS and 24.40% in BRISQUE, while also achieving the fastest training speed.
arXiv Detail & Related papers (2025-04-01T03:17:24Z) - 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives [115.67081491747943]
Dynamic 3D scene representation and novel view synthesis are crucial for enabling AR/VR and metaverse applications.<n>We reformulate the reconstruction of a time-varying 3D scene as approximating its underlying 4D volume.<n>We derive several compact variants that effectively reduce the memory footprint to address its storage bottleneck.
arXiv Detail & Related papers (2024-12-30T05:30:26Z) - SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting [53.32467009064287]
We propose a text-driven 3D-consistent scene generation model: SceneDreamer360.
Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation.
Our experiments demonstrate that SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt.
arXiv Detail & Related papers (2024-08-25T02:56:26Z) - HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions [31.342899807980654]
3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry.
We introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene.
We then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes.
arXiv Detail & Related papers (2024-07-21T14:52:51Z) - 4K4DGen: Panoramic 4D Generation at 4K Resolution [67.98105958108503]
We tackle the challenging task of elevating a single panorama to an immersive 4D experience.
For the first time, we demonstrate the capability to generate omnidirectional dynamic scenes with 360$circ$ views at 4K resolution.
We achieve high-quality Panorama-to-4D generation at a resolution of 4K for the first time.
arXiv Detail & Related papers (2024-06-19T13:11:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.