4K4DGen: Panoramic 4D Generation at 4K Resolution
- URL: http://arxiv.org/abs/2406.13527v3
- Date: Thu, 03 Oct 2024 06:26:49 GMT
- Title: 4K4DGen: Panoramic 4D Generation at 4K Resolution
- Authors: Renjie Li, Panwang Pan, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhengzhong Tu, Zhiwen Fan,
- Abstract summary: We tackle the challenging task of elevating a single panorama to an immersive 4D experience.
For the first time, we demonstrate the capability to generate omnidirectional dynamic scenes with 360$circ$ views at 4K resolution.
We achieve high-quality Panorama-to-4D generation at a resolution of 4K for the first time.
- Score: 67.98105958108503
- License:
- Abstract: The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic objects or perform outpainting from a single perspective image, failing to meet the requirements of VR/AR applications that need free-viewpoint, 360$^{\circ}$ virtual views where users can move in all directions. In this work, we tackle the challenging task of elevating a single panorama to an immersive 4D experience. For the first time, we demonstrate the capability to generate omnidirectional dynamic scenes with 360$^{\circ}$ views at 4K (4096 $\times$ 2048) resolution, thereby providing an immersive user experience. Our method introduces a pipeline that facilitates natural scene animations and optimizes a set of dynamic Gaussians using efficient splatting techniques for real-time exploration. To overcome the lack of scene-scale annotated 4D data and models, especially in panoramic formats, we propose a novel \textbf{Panoramic Denoiser} that adapts generic 2D diffusion priors to animate consistently in 360$^{\circ}$ images, transforming them into panoramic videos with dynamic scenes at targeted regions. Subsequently, we propose \textbf{Dynamic Panoramic Lifting} to elevate the panoramic video into a 4D immersive environment while preserving spatial and temporal consistency. By transferring prior knowledge from 2D models in the perspective domain to the panoramic domain and the 4D lifting with spatial appearance and geometry regularization, we achieve high-quality Panorama-to-4D generation at a resolution of 4K for the first time.
Related papers
- VividDream: Generating 3D Scene with Ambient Dynamics [13.189732244489225]
We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt.
VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts.
arXiv Detail & Related papers (2024-05-30T17:59:24Z) - GFlow: Recovering 4D World from Monocular Video [58.63051670458107]
We introduce GFlow, a framework that lifts a video (3D) to a 4D explicit representation, entailing a flow of Gaussian splatting through space and time.
GFlow first clusters the scene into still and moving parts, then applies a sequential optimization process.
GFlow transcends the boundaries of mere 4D reconstruction.
arXiv Detail & Related papers (2024-05-28T17:59:22Z) - 4D Panoptic Scene Graph Generation [102.22082008976228]
We introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding.
Specifically, PSG-4D abstracts rich 4D sensory data into nodes, which represent entities with precise location and status information, and edges, which capture the temporal relations.
We propose PSG4DFormer, a Transformer-based model that can predict panoptic segmentation masks, track masks along the time axis, and generate the corresponding scene graphs.
arXiv Detail & Related papers (2024-05-16T17:56:55Z) - DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting [56.101576795566324]
We present a text-to-3D 360$circ$ scene generation pipeline.
Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement.
Our method offers a globally consistent 3D scene within a 360$circ$ perspective.
arXiv Detail & Related papers (2024-04-10T10:46:59Z) - 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
This work introduces 4DGen, a novel framework for grounded 4D content creation.
We identify static 3D assets and monocular video sequences as key components in constructing the 4D content.
Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos)
arXiv Detail & Related papers (2023-12-28T18:53:39Z) - NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed
Neural Radiance Fields [99.57774680640581]
We present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering.
We propose to decompose the 4D space according to temporal characteristics. Points in the 4D space are associated with probabilities belonging to three categories: static, deforming, and new areas.
arXiv Detail & Related papers (2022-10-28T07:11:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.