Related papers: ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models

URL: http://arxiv.org/abs/2506.23513v1
Date: Mon, 30 Jun 2025 04:33:34 GMT
Title: ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models
Authors: Zixun Fang, Kai Zhu, Zhiheng Liu, Yu Liu, Wei Zhai, Yang Cao, Zheng-Jun Zha,
Abstract summary: We propose a novel framework utilizing pretrained perspective video models for generating panoramic videos.<n>Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously.<n>Our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.
Score: 52.87334248847314
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Panoramic video generation aims to synthesize 360-degree immersive videos, holding significant importance in the fields of VR, world models, and spatial intelligence. Existing works fail to synthesize high-quality panoramic videos due to the inherent modality gap between panoramic data and perspective data, which constitutes the majority of the training data for modern diffusion models. In this paper, we propose a novel framework utilizing pretrained perspective video models for generating panoramic videos. Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously. With our proposed Pano-Perspective attention mechanism, the model benefits from pretrained perspective priors and captures the panoramic spatial correlations of the ViewPoint map effectively. Extensive experiments demonstrate that our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.

Related papers

PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms [41.92179513409301]
Existing panoramic video generation models struggle to leverage pre-trained generative priors from conventional text-to-video models for high-quality panoramic videos.<n>In this paper, we introduce PanoWan to effectively lift pre-trained text-to-video models to the panoramic domain, equipped with minimal modules.<n>To provide sufficient panoramic videos for learning these lifted representations, we contribute PanoVid, a high-quality panoramic video dataset with captions and diverse scenarios.
arXiv Detail & Related papers (2025-05-28T06:24:21Z)
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation [29.579493980120173]
HoloTime is a framework that integrates video diffusion models to generate panoramic videos from a single prompt or reference image.<n>360World dataset is the first comprehensive collection of panoramic videos suitable for downstream 4D scene reconstruction tasks.<n>Panoramic Animator is a two-stage image-to-video diffusion model that can convert panoramic images into high-quality panoramic videos.<n>Panoramic Space-Time Reconstruction uses a space-time depth estimation method to transform the generated panoramic videos into 4D point clouds.
arXiv Detail & Related papers (2025-04-30T13:55:28Z)
VideoPanda: Video Panoramic Diffusion with Multi-view Attention [57.87428280844657]
High resolution panoramic video content is paramount for immersive experiences in Virtual Reality, but is non-trivial to collect as it requires specialized equipment and intricate camera setups.<n>VideoPanda generates more realistic and coherent 360$circ$ panoramas across all input conditions compared to existing methods.
arXiv Detail & Related papers (2025-04-15T16:58:15Z)
Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos [64.10180665546237]
360deg videos offer a more complete perspective of our surroundings.<n>Existing video models excel at producing standard videos, but their ability to generate full panoramic videos remains elusive.<n>We develop a high-quality data filtering pipeline to curate pairwise training data and improve the quality of 360deg video generation.<n> Experimental results demonstrate that our model can generate realistic and coherent 360deg videos from in-the-wild perspective video.
arXiv Detail & Related papers (2025-04-10T17:51:38Z)
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation. We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z)
VidPanos: Generative Panoramic Videos from Casual Panning Videos [73.77443496436749]
Panoramic image stitching provides a unified, wide-angle view of a scene that extends beyond the camera's field of view. We present a method for synthesizing a panoramic video from a casually-captured panning video. Our system can create video panoramas for a range of in-the-wild scenes including people, vehicles, and flowing water.
arXiv Detail & Related papers (2024-10-17T17:53:24Z)
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images. Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z)
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation [39.269864548255576]
We present a panoramic video dataset, PanoVOS. The dataset provides 150 videos with high video resolutions and diverse motions. We present a Panoramic Space Consistency Transformer (PSCFormer) which can effectively utilize the semantic boundary information of the previous frame for pixel-level matching with the current frame.
arXiv Detail & Related papers (2023-09-21T17:59:02Z)
Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic Video with Applications for Virtual Reality [2.294014185517203]
We introduce a convolutional neural network model for unsupervised learning of depth and ego-motion from cylindrical panoramic video. Panoramic depth estimation is an important technology for applications such as virtual reality, 3D modeling, and autonomous robotic navigation.
arXiv Detail & Related papers (2020-10-14T16:41:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.