4K4DGen: Panoramic 4D Generation at 4K Resolution
- URL: http://arxiv.org/abs/2406.13527v3
- Date: Thu, 03 Oct 2024 06:26:49 GMT
- Title: 4K4DGen: Panoramic 4D Generation at 4K Resolution
- Authors: Renjie Li, Panwang Pan, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhengzhong Tu, Zhiwen Fan,
- Abstract summary: We tackle the challenging task of elevating a single panorama to an immersive 4D experience.
For the first time, we demonstrate the capability to generate omnidirectional dynamic scenes with 360$circ$ views at 4K resolution.
We achieve high-quality Panorama-to-4D generation at a resolution of 4K for the first time.
- Score: 67.98105958108503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic objects or perform outpainting from a single perspective image, failing to meet the requirements of VR/AR applications that need free-viewpoint, 360$^{\circ}$ virtual views where users can move in all directions. In this work, we tackle the challenging task of elevating a single panorama to an immersive 4D experience. For the first time, we demonstrate the capability to generate omnidirectional dynamic scenes with 360$^{\circ}$ views at 4K (4096 $\times$ 2048) resolution, thereby providing an immersive user experience. Our method introduces a pipeline that facilitates natural scene animations and optimizes a set of dynamic Gaussians using efficient splatting techniques for real-time exploration. To overcome the lack of scene-scale annotated 4D data and models, especially in panoramic formats, we propose a novel \textbf{Panoramic Denoiser} that adapts generic 2D diffusion priors to animate consistently in 360$^{\circ}$ images, transforming them into panoramic videos with dynamic scenes at targeted regions. Subsequently, we propose \textbf{Dynamic Panoramic Lifting} to elevate the panoramic video into a 4D immersive environment while preserving spatial and temporal consistency. By transferring prior knowledge from 2D models in the perspective domain to the panoramic domain and the 4D lifting with spatial appearance and geometry regularization, we achieve high-quality Panorama-to-4D generation at a resolution of 4K for the first time.
Related papers
- HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation [29.579493980120173]
HoloTime is a framework that integrates video diffusion models to generate panoramic videos from a single prompt or reference image.
360World dataset is the first comprehensive collection of panoramic videos suitable for downstream 4D scene reconstruction tasks.
Panoramic Animator is a two-stage image-to-video diffusion model that can convert panoramic images into high-quality panoramic videos.
Panoramic Space-Time Reconstruction uses a space-time depth estimation method to transform the generated panoramic videos into 4D point clouds.
arXiv Detail & Related papers (2025-04-30T13:55:28Z) - Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization [31.956858341885436]
Video4DGen is a novel framework that excels in generating 4D representations from single or multiple generated videos.
Video4DGen offers a powerful tool for applications in virtual reality, animation, and beyond.
arXiv Detail & Related papers (2025-04-05T12:13:05Z) - Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image.
Our key insight is to distill pre-trained foundation models for consistent 4D scene representation.
The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z) - Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields [56.184278668305076]
We introduce Feature4X, a universal framework to extend functionality from 2D vision foundation model into the 4D realm.
The framework is first to distill and lift the features of video foundation models into an explicit 4D feature field using Splatting.
Our experiments showcase novel view segment anything, geometric and appearance scene editing, and free-form VQA across all time steps.
arXiv Detail & Related papers (2025-03-26T17:56:16Z) - 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives [116.2042238179433]
In this paper, we frame dynamic scenes as unconstrained 4D volume learning problems.
We represent a target dynamic scene using a collection of 4D Gaussian primitives with explicit geometry and appearance features.
This approach can capture relevant information in space and time by fitting the underlying photorealistic-temporal volume.
Notably, our 4DGS model is the first solution that supports real-time rendering of high-resolution, novel views for complex dynamic scenes.
arXiv Detail & Related papers (2024-12-30T05:30:26Z) - PaintScene4D: Consistent 4D Scene Generation from Text Prompts [29.075849524496707]
PaintScene4D is a novel text-to-4D scene generation framework.
It harnesses video generative models trained on diverse real-world datasets.
It produces realistic 4D scenes that can be viewed from arbitrary trajectories.
arXiv Detail & Related papers (2024-12-05T18:59:57Z) - LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation [105.52153675890408]
3D immersive scene generation is a challenging yet critical task in computer vision and graphics.
Layerpano3D is a novel framework for full-view, explorable panoramic 3D scene generation from a single text prompt.
arXiv Detail & Related papers (2024-08-23T17:50:23Z) - VividDream: Generating 3D Scene with Ambient Dynamics [13.189732244489225]
We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt.
VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts.
arXiv Detail & Related papers (2024-05-30T17:59:24Z) - GFlow: Recovering 4D World from Monocular Video [58.63051670458107]
We introduce GFlow, a framework that lifts a video (3D) to a 4D explicit representation, entailing a flow of Gaussian splatting through space and time.
GFlow first clusters the scene into still and moving parts, then applies a sequential optimization process.
GFlow transcends the boundaries of mere 4D reconstruction.
arXiv Detail & Related papers (2024-05-28T17:59:22Z) - 4D Panoptic Scene Graph Generation [102.22082008976228]
We introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding.
Specifically, PSG-4D abstracts rich 4D sensory data into nodes, which represent entities with precise location and status information, and edges, which capture the temporal relations.
We propose PSG4DFormer, a Transformer-based model that can predict panoptic segmentation masks, track masks along the time axis, and generate the corresponding scene graphs.
arXiv Detail & Related papers (2024-05-16T17:56:55Z) - DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting [56.101576795566324]
We present a text-to-3D 360$circ$ scene generation pipeline.
Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement.
Our method offers a globally consistent 3D scene within a 360$circ$ perspective.
arXiv Detail & Related papers (2024-04-10T10:46:59Z) - 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
This work introduces 4DGen, a novel framework for grounded 4D content creation.
We identify static 3D assets and monocular video sequences as key components in constructing the 4D content.
Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos)
arXiv Detail & Related papers (2023-12-28T18:53:39Z) - NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed
Neural Radiance Fields [99.57774680640581]
We present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering.
We propose to decompose the 4D space according to temporal characteristics. Points in the 4D space are associated with probabilities belonging to three categories: static, deforming, and new areas.
arXiv Detail & Related papers (2022-10-28T07:11:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.