Related papers: DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

URL: http://arxiv.org/abs/2412.11100v1
Date: Sun, 15 Dec 2024 07:42:26 GMT
Title: DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Authors: Jinxiu Liu, Shaoheng Lin, Yinxiao Li, Ming-Hsuan Yang,
Abstract summary: DynamicScaler enables spatially scalable and panoramic dynamic scene synthesis. We employ a Global Motion Guidance mechanism to ensure both local detail fidelity and global motion continuity. Our method achieves superior content and motion quality in panoramic scene-level video generation.
Score: 46.91656616577897
License:
Abstract: The increasing demand for immersive AR/VR applications and spatial intelligence has heightened the need to generate high-quality scene-level and 360{\deg} panoramic video. However, most video diffusion models are constrained by limited resolution and aspect ratio, which restricts their applicability to scene-level dynamic content synthesis. In this work, we propose the DynamicScaler, addressing these challenges by enabling spatially scalable and panoramic dynamic scene synthesis that preserves coherence across panoramic scenes of arbitrary size. Specifically, we introduce a Offset Shifting Denoiser, facilitating efficient, synchronous, and coherent denoising panoramic dynamic scenes via a diffusion model with fixed resolution through a seamless rotating Window, which ensures seamless boundary transitions and consistency across the entire panoramic space, accommodating varying resolutions and aspect ratios. Additionally, we employ a Global Motion Guidance mechanism to ensure both local detail fidelity and global motion continuity. Extensive experiments demonstrate our method achieves superior content and motion quality in panoramic scene-level video generation, offering a training-free, efficient, and scalable solution for immersive dynamic scene creation with constant VRAM consumption regardless of the output video resolution. Our project page is available at \url{https://dynamic-scaler.pages.dev/}.

Related papers

T-SVG: Text-Driven Stereoscopic Video Generation [87.62286959918566]
This paper introduces the Text-driven Stereoscopic Video Generation (T-SVG) system. It streamlines video generation by using text prompts to create reference videos. These videos are transformed into 3D point cloud sequences, which are rendered from two perspectives with subtle parallax differences.
arXiv Detail & Related papers (2024-12-12T14:48:46Z)
DiffMVR: Diffusion-based Automated Multi-Guidance Video Restoration [10.637125300701795]
We propose a diffusion-based video-level inpainting model, DiffMVR. Our approach introduces a dynamic dual-guided image prompting system, leveraging adaptive reference frames to guide the inpainting process. This enables the model to capture both fine-grained details and smooth transitions between video frames, offering precise control over inpainting direction and significantly improving restoration accuracy in challenging, dynamic environments.
arXiv Detail & Related papers (2024-11-27T20:43:35Z)
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics. At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z)
Modeling Ambient Scene Dynamics for Free-view Synthesis [31.233859111566613]
We introduce a novel method for dynamic free-view synthesis of an ambient scenes from a monocular capture. Our method builds upon the recent advancements in 3D Gaussian Splatting (3DGS) that can faithfully reconstruct complex static scenes.
arXiv Detail & Related papers (2024-06-13T17:59:11Z)
Lumiere: A Space-Time Diffusion Model for Video Generation [75.54967294846686]
We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once. This is in contrast to existing video models which synthesize distants followed by temporal super-resolution. By deploying both spatial and (importantly) temporal down- and up-sampling, our model learns to directly generate a full-frame-rate, low-resolution video.
arXiv Detail & Related papers (2024-01-23T18:05:25Z)
DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views. We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z)
Editable Free-viewpoint Video Using a Layered Neural Representation [35.44420164057911]
We propose the first approach for editable free-viewpoint video generation for large-scale dynamic scenes using only sparse 16 cameras. The core of our approach is a new layered neural representation, where each dynamic entity including the environment itself is formulated into a space-time coherent neural layered radiance representation called ST-NeRF. Experiments demonstrate the effectiveness of our approach to achieve high-quality, photo-realistic, and editable free-viewpoint video generation for dynamic scenes.
arXiv Detail & Related papers (2021-04-30T06:50:45Z)
Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video [76.19076002661157]
Non-Rigid Neural Radiance Fields (NR-NeRF) is a reconstruction and novel view synthesis approach for general non-rigid dynamic scenes. We show that even a single consumer-grade camera is sufficient to synthesize sophisticated renderings of a dynamic scene from novel virtual camera views.
arXiv Detail & Related papers (2020-12-22T18:46:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.