Related papers: CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion

CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion

URL: http://arxiv.org/abs/2509.19979v1
Date: Wed, 24 Sep 2025 10:34:24 GMT
Title: CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion
Authors: Chenhao Ji, Chaohui Yu, Junyao Gao, Fan Wang, Cairong Zhao,
Abstract summary: CamPVG is the first diffusion-based framework for panoramic video generation guided by precise camera poses.<n>We achieve camera position encoding for panoramic images and cross-view feature aggregation based on spherical projection.<n>Our method generates high-quality panoramic videos consistent with camera trajectories, far surpassing existing methods in panoramic video generation.
Score: 31.032317079295762
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, camera-controlled video generation has seen rapid development, offering more precise control over video generation. However, existing methods predominantly focus on camera control in perspective projection video generation, while geometrically consistent panoramic video generation remains challenging. This limitation is primarily due to the inherent complexities in panoramic pose representation and spherical projection. To address this issue, we propose CamPVG, the first diffusion-based framework for panoramic video generation guided by precise camera poses. We achieve camera position encoding for panoramic images and cross-view feature aggregation based on spherical projection. Specifically, we propose a panoramic Pl\"ucker embedding that encodes camera extrinsic parameters through spherical coordinate transformation. This pose encoder effectively captures panoramic geometry, overcoming the limitations of traditional methods when applied to equirectangular projections. Additionally, we introduce a spherical epipolar module that enforces geometric constraints through adaptive attention masking along epipolar lines. This module enables fine-grained cross-view feature aggregation, substantially enhancing the quality and consistency of generated panoramic videos. Extensive experiments demonstrate that our method generates high-quality panoramic videos consistent with camera trajectories, far surpassing existing methods in panoramic video generation.

Related papers

Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation [49.12018869332346]
InfCam is a camera-controlled video-to-video generation framework with high pose fidelity.<n>The framework integrates two key components: (1) infinite homography warping, which encodes 3D camera rotations directly within the 2D latent space of a video diffusion model.
arXiv Detail & Related papers (2025-12-18T20:03:05Z)
GimbalDiffusion: Gravity-Aware Camera Control for Video Generation [30.697985626973665]
We introduce a framework that enables camera control grounded in physical-world coordinates, using gravity as a global reference.<n>We leverage panoramic 360-degree videos to construct a wide variety of camera trajectories, well beyond the predominantly straight, forward-facing trajectories seen in conventional video data.<n>We establish a benchmark for camera-aware video generation by rebalancing SpatialVID-HQ for comprehensive evaluation under wide camera pitch variation.
arXiv Detail & Related papers (2025-12-09T20:54:35Z)
PanFlow: Decoupled Motion Control for Panoramic Video Generation [52.47902086091194]
PanFlow is a novel approach that exploits the spherical nature of panoramas to decouple the highly dynamic camera rotation from the input optical flow condition.<n>To support effective training, we curate a large-scale, motion-rich panoramic video dataset with frame-level pose and flow annotations.
arXiv Detail & Related papers (2025-11-30T11:03:31Z)
PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion [87.13016347332943]
PanoWorld-X is a novel framework for high-fidelity and controllable panoramic video generation with diverse camera trajectories.<n>Our experiments demonstrate superior performance in various aspects, including motion range, control precision, and visual quality.
arXiv Detail & Related papers (2025-09-29T16:22:00Z)
ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models [52.87334248847314]
We propose a novel framework utilizing pretrained perspective video models for generating panoramic videos.<n>Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously.<n>Our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.
arXiv Detail & Related papers (2025-06-30T04:33:34Z)
PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms [41.92179513409301]
Existing panoramic video generation models struggle to leverage pre-trained generative priors from conventional text-to-video models for high-quality panoramic videos.<n>In this paper, we introduce PanoWan to effectively lift pre-trained text-to-video models to the panoramic domain, equipped with minimal modules.<n>To provide sufficient panoramic videos for learning these lifted representations, we contribute PanoVid, a high-quality panoramic video dataset with captions and diverse scenarios.
arXiv Detail & Related papers (2025-05-28T06:24:21Z)
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models [89.63787060844409]
CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.<n>We take an approach that progressively expands the generation of dynamic scenes.
arXiv Detail & Related papers (2025-03-13T17:42:01Z)
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation. We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z)
VidPanos: Generative Panoramic Videos from Casual Panning Videos [73.77443496436749]
Panoramic image stitching provides a unified, wide-angle view of a scene that extends beyond the camera's field of view. We present a method for synthesizing a panoramic video from a casually-captured panning video. Our system can create video panoramas for a range of in-the-wild scenes including people, vehicles, and flowing water.
arXiv Detail & Related papers (2024-10-17T17:53:24Z)
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation. Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency. Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z)
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation [39.269864548255576]
We present a panoramic video dataset, PanoVOS. The dataset provides 150 videos with high video resolutions and diverse motions. We present a Panoramic Space Consistency Transformer (PSCFormer) which can effectively utilize the semantic boundary information of the previous frame for pixel-level matching with the current frame.
arXiv Detail & Related papers (2023-09-21T17:59:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.