Related papers: PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion

PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion

URL: http://arxiv.org/abs/2509.24997v1
Date: Mon, 29 Sep 2025 16:22:00 GMT
Title: PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion
Authors: Yuyang Yin, HaoXiang Guo, Fangfu Liu, Mengyu Wang, Hanwen Liang, Eric Li, Yikai Wang, Xiaojie Jin, Yao Zhao, Yunchao Wei,
Abstract summary: PanoWorld-X is a novel framework for high-fidelity and controllable panoramic video generation with diverse camera trajectories.<n>Our experiments demonstrate superior performance in various aspects, including motion range, control precision, and visual quality.
Score: 87.13016347332943
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating a complete and explorable 360-degree visual world enables a wide range of downstream applications. While prior works have advanced the field, they remain constrained by either narrow field-of-view limitations, which hinder the synthesis of continuous and holistic scenes, or insufficient camera controllability that restricts free exploration by users or autonomous agents. To address this, we propose PanoWorld-X, a novel framework for high-fidelity and controllable panoramic video generation with diverse camera trajectories. Specifically, we first construct a large-scale dataset of panoramic video-exploration route pairs by simulating camera trajectories in virtual 3D environments via Unreal Engine. As the spherical geometry of panoramic data misaligns with the inductive priors from conventional video diffusion, we then introduce a Sphere-Aware Diffusion Transformer architecture that reprojects equirectangular features onto the spherical surface to model geometric adjacency in latent space, significantly enhancing visual fidelity and spatiotemporal continuity. Extensive experiments demonstrate that our PanoWorld-X achieves superior performance in various aspects, including motion range, control precision, and visual quality, underscoring its potential for real-world applications.

Related papers

OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes [57.790894531046796]
Panorama-based 2D lifting has emerged as a promising technique to produce immersive, realistic, and diverse 3D environments.<n>In this work, we advance this technique to generate graphics-ready 3D scenes suitable for physically based rendering (PBR), relighting, and simulation.<n>Our key insight is to repurpose 2D generative models for panoramic perception of geometry, textures, and PBR materials.<n>Based on a lightweight and efficient cross-modal adapter structure, OmniX reuses 2D generative priors for a broad range of panoramic vision tasks.
arXiv Detail & Related papers (2025-10-30T17:59:51Z)
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory [40.346684158976494]
EvoWorld bridges panoramic video generation with evolving 3D memory to enable spatially consistent long-horizon exploration.<n>Unlike prior state-of-the-arts that synthesize videos only, our key insight lies in exploiting this evolving 3D reconstruction as explicit spatial guidance.<n>To evaluate long-range exploration capabilities, we introduce the first comprehensive benchmark spanning synthetic outdoor environments, Habitat indoor scenes, and challenging real-world scenarios.
arXiv Detail & Related papers (2025-10-01T17:59:38Z)
Matrix-3D: Omnidirectional Explorable 3D World Generation [20.568791715708134]
We propose Matrix-3D, a framework that utilize panoramic representation for wide-coverage omnidirectional 3D world generation.<n>We first train a trajectory-guided panoramic video diffusion model that employs scene mesh renders as condition.<n>To lift the panorama scene video to 3D world, we propose two separate methods: (1) a feed-forward large panorama reconstruction model for rapid 3D scene reconstruction and (2) an optimization-based pipeline for accurate and detailed 3D scene reconstruction.
arXiv Detail & Related papers (2025-08-11T15:29:57Z)
ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models [52.87334248847314]
We propose a novel framework utilizing pretrained perspective video models for generating panoramic videos.<n>Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously.<n>Our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.
arXiv Detail & Related papers (2025-06-30T04:33:34Z)
WorldExplorer: Towards Generating Fully Navigable 3D Scenes [48.16064304951891]
WorldExplorer builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints.<n>We generate multiple videos along short, pre-defined trajectories, that explore the scene in depth.<n>Our novel scene memory conditions each video on the most relevant prior views, while a collision-detection mechanism prevents degenerate results.
arXiv Detail & Related papers (2025-06-02T15:41:31Z)
PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms [41.92179513409301]
Existing panoramic video generation models struggle to leverage pre-trained generative priors from conventional text-to-video models for high-quality panoramic videos.<n>In this paper, we introduce PanoWan to effectively lift pre-trained text-to-video models to the panoramic domain, equipped with minimal modules.<n>To provide sufficient panoramic videos for learning these lifted representations, we contribute PanoVid, a high-quality panoramic video dataset with captions and diverse scenarios.
arXiv Detail & Related papers (2025-05-28T06:24:21Z)
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation. We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z)
PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas [54.4948540627471]
We propose PanoGRF, Generalizable Spherical Radiance Fields for Wide-baseline Panoramas. Unlike generalizable radiance fields trained on perspective images, PanoGRF avoids the information loss from panorama-to-perspective conversion. Results on multiple panoramic datasets demonstrate that PanoGRF significantly outperforms state-of-the-art generalizable view synthesis methods.
arXiv Detail & Related papers (2023-06-02T13:35:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.