Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos
- URL: http://arxiv.org/abs/2504.07940v2
- Date: Thu, 17 Apr 2025 14:35:15 GMT
- Title: Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos
- Authors: Rundong Luo, Matthew Wallingford, Ali Farhadi, Noah Snavely, Wei-Chiu Ma,
- Abstract summary: 360deg videos offer a more complete perspective of our surroundings.<n>Existing video models excel at producing standard videos, but their ability to generate full panoramic videos remains elusive.<n>We develop a high-quality data filtering pipeline to curate pairwise training data and improve the quality of 360deg video generation.<n> Experimental results demonstrate that our model can generate realistic and coherent 360deg videos from in-the-wild perspective video.
- Score: 64.10180665546237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 360{\deg} videos have emerged as a promising medium to represent our dynamic visual world. Compared to the "tunnel vision" of standard cameras, their borderless field of view offers a more complete perspective of our surroundings. While existing video models excel at producing standard videos, their ability to generate full panoramic videos remains elusive. In this paper, we investigate the task of video-to-360{\deg} generation: given a perspective video as input, our goal is to generate a full panoramic video that is consistent with the original video. Unlike conventional video generation tasks, the output's field of view is significantly larger, and the model is required to have a deep understanding of both the spatial layout of the scene and the dynamics of objects to maintain spatio-temporal consistency. To address these challenges, we first leverage the abundant 360{\deg} videos available online and develop a high-quality data filtering pipeline to curate pairwise training data. We then carefully design a series of geometry- and motion-aware operations to facilitate the learning process and improve the quality of 360{\deg} video generation. Experimental results demonstrate that our model can generate realistic and coherent 360{\deg} videos from in-the-wild perspective video. In addition, we showcase its potential applications, including video stabilization, camera viewpoint control, and interactive visual question answering.
Related papers
- ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models [52.87334248847314]
We propose a novel framework utilizing pretrained perspective video models for generating panoramic videos.<n>Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously.<n>Our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.
arXiv Detail & Related papers (2025-06-30T04:33:34Z) - WorldExplorer: Towards Generating Fully Navigable 3D Scenes [49.21733308718443]
WorldExplorer builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints.<n>We generate multiple videos along short, pre-defined trajectories, that explore the scene in depth.<n>Our novel scene memory conditions each video on the most relevant prior views, while a collision-detection mechanism prevents degenerate results.
arXiv Detail & Related papers (2025-06-02T15:41:31Z) - PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms [41.92179513409301]
Existing panoramic video generation models struggle to leverage pre-trained generative priors from conventional text-to-video models for high-quality panoramic videos.<n>In this paper, we introduce PanoWan to effectively lift pre-trained text-to-video models to the panoramic domain, equipped with minimal modules.<n>To provide sufficient panoramic videos for learning these lifted representations, we contribute PanoVid, a high-quality panoramic video dataset with captions and diverse scenarios.
arXiv Detail & Related papers (2025-05-28T06:24:21Z) - VideoPanda: Video Panoramic Diffusion with Multi-view Attention [57.87428280844657]
High resolution panoramic video content is paramount for immersive experiences in Virtual Reality, but is non-trivial to collect as it requires specialized equipment and intricate camera setups.
VideoPanda generates more realistic and coherent 360$circ$ panoramas across all input conditions compared to existing methods.
arXiv Detail & Related papers (2025-04-15T16:58:15Z) - WorldPrompter: Traversable Text-to-Scene Generation [18.405299478122693]
We introduce WorldPrompter, a novel generative pipeline for synthesizing traversable 3D scenes from text prompts.<n>WorldPrompter incorporates a conditional 360deg panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment.<n>The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene.
arXiv Detail & Related papers (2025-04-02T18:04:32Z) - T-SVG: Text-Driven Stereoscopic Video Generation [87.62286959918566]
This paper introduces the Text-driven Stereoscopic Video Generation (T-SVG) system.<n>It streamlines video generation by using text prompts to create reference videos.<n>These videos are transformed into 3D point cloud sequences, which are rendered from two perspectives with subtle parallax differences.
arXiv Detail & Related papers (2024-12-12T14:48:46Z) - From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos [71.22810401256234]
Three-dimensional (3D) understanding of objects and scenes play a key role in humans' ability to interact with the world.<n>Large scale synthetic and object-centric 3D datasets have shown to be effective in training models that have 3D understanding of objects.<n>We introduce 360-1M, a 360 video dataset, and a process for efficiently finding corresponding frames from diverse viewpoints at scale.
arXiv Detail & Related papers (2024-12-10T18:59:44Z) - Imagine360: Immersive 360 Video Generation from Perspective Anchor [79.97844408255897]
Imagine360 is a perspective-to-$360circ$ video generation framework.<n>It learns fine-grained spherical visual and motion patterns from limited $360circ$ video data.<n>It achieves superior graphics quality and motion coherence among state-of-the-art $360circ$ video generation methods.
arXiv Detail & Related papers (2024-12-04T18:50:08Z) - Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation.
Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency.
Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z) - SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix [60.48666051245761]
We propose a pose-free and training-free approach for generating 3D stereoscopic videos.
Our method warps a generated monocular video into camera views on stereoscopic baseline using estimated video depth.
We develop a disocclusion boundary re-injection scheme that further improves the quality of video inpainting.
arXiv Detail & Related papers (2024-06-29T08:33:55Z) - 360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model [23.708946172342067]
We propose a pipeline named 360-Degree Video Diffusion model (360DVD) for generating 360-degree panoramic videos.
We introduce a lightweight 360-Adapter accompanied by 360 Enhancement Techniques to transform pre-trained T2V models for panorama video generation.
We also propose a new panorama dataset named WEB360 consisting of panoramic video-text pairs for training 360DVD.
arXiv Detail & Related papers (2024-01-12T13:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.