Related papers: Look Beyond: Two-Stage Scene View Generation via Panorama and Video Diffusion

Look Beyond: Two-Stage Scene View Generation via Panorama and Video Diffusion

URL: http://arxiv.org/abs/2509.00843v1
Date: Sun, 31 Aug 2025 13:27:15 GMT
Title: Look Beyond: Two-Stage Scene View Generation via Panorama and Video Diffusion
Authors: Xueyang Kang, Zhengkang Xiang, Zezheng Zhang, Kourosh Khoshelham,
Abstract summary: Novel view synthesis (NVS) from a single image is highly illposed due to large unobserved regions.<n>We propose a model that addresses this by decomposing single-view NVS into a 360-degree scene extrapolation followed by novel view.<n>Our approach outperforms existing methods in generating coherent views along user-defined trajectories.
Score: 2.5479056464266994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Novel view synthesis (NVS) from a single image is highly ill-posed due to large unobserved regions, especially for views that deviate significantly from the input. While existing methods focus on consistency between the source and generated views, they often fail to maintain coherence and correct view alignment across long-range or looped trajectories. We propose a model that addresses this by decomposing single-view NVS into a 360-degree scene extrapolation followed by novel view interpolation. This design ensures long-term view and scene consistency by conditioning on keyframes extracted and warped from a generated panoramic representation. In the first stage, a panorama diffusion model learns the scene prior from the input perspective image. Perspective keyframes are then sampled and warped from the panorama and used as anchor frames in a pre-trained video diffusion model, which generates novel views through a proposed spatial noise diffusion process. Compared to prior work, our method produces globally consistent novel views -- even in loop closure scenarios -- while enabling flexible camera control. Experiments on diverse scene datasets demonstrate that our approach outperforms existing methods in generating coherent views along user-defined trajectories. Our implementation is available at https://github.com/YiGuYT/LookBeyond.

Related papers

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models [52.87334248847314]
We propose a novel framework utilizing pretrained perspective video models for generating panoramic videos.<n>Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously.<n>Our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.
arXiv Detail & Related papers (2025-06-30T04:33:34Z)
DreamJourney: Perpetual View Generation with Video Diffusion Models [91.88716097573206]
Perpetual view generation aims to synthesize a long-term video corresponding to an arbitrary camera trajectory solely from a single input image.<n>Recent methods commonly utilize a pre-trained text-to-image diffusion model to synthesize new content of previously unseen regions along camera movement.<n>We present DreamJourney, a two-stage framework that leverages the world simulation capacity of video diffusion models to trigger a new perpetual scene view generation task.
arXiv Detail & Related papers (2025-06-21T12:51:34Z)
Stable Virtual Camera: Generative View Synthesis with Diffusion Models [51.71244310522393]
We present Stable Virtual Camera (Seva), a generalist diffusion model that creates novel views of a scene.<n>Our approach overcomes these limitations through simple model design, optimized training recipe, and flexible sampling strategy.<n>Our method can generate high-quality videos lasting up to half a minute with seamless loop closure.
arXiv Detail & Related papers (2025-03-18T17:57:22Z)
ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis [47.0052408875896]
ViewFusion is an end-to-end generative approach to novel view synthesis with unparalleled flexibility.<n>Our method is tested on the relatively small Neural 3D Mesh Renderer dataset.
arXiv Detail & Related papers (2024-02-05T11:22:14Z)
UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images. We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z)
Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models [24.301334966272297]
We propose a novel generative model capable of producing a sequence of photorealistic images consistent with a specified camera trajectory. To measure the consistency over a sequence of generated views, we introduce a new metric, the thresholded symmetric epipolar distance (TSED)
arXiv Detail & Related papers (2023-04-21T02:01:02Z)
DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models [91.94566873400277]
DiffDreamer is an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory. We show that image-conditioned diffusion models can effectively perform long-range scene extrapolation while preserving consistency significantly better than prior GAN-based methods.
arXiv Detail & Related papers (2022-11-22T10:06:29Z)
IBRNet: Learning Multi-View Image-Based Rendering [67.15887251196894]
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering.
arXiv Detail & Related papers (2021-02-25T18:56:21Z)
Free View Synthesis [100.86844680362196]
We present a method for novel view synthesis from input images that are freely distributed around a scene. Our method does not rely on a regular arrangement of input views, can synthesize images for free camera movement through the scene, and works for general scenes with unconstrained geometric layouts.
arXiv Detail & Related papers (2020-08-12T18:16:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.