Related papers: DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

URL: http://arxiv.org/abs/2410.24203v1
Date: Thu, 31 Oct 2024 17:57:02 GMT
Title: DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
Authors: Weicai Ye, Chenhao Ji, Zheng Chen, Junyao Gao, Xiaoshui Huang, Song-Hai Zhang, Wanli Ouyang, Tong He, Cairong Zhao, Guofeng Zhang,
Abstract summary: We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation. We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
Score: 60.45000652592418
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Diffusion-based methods have achieved remarkable achievements in 2D image or 3D object generation, however, the generation of 3D scenes and even $360^{\circ}$ images remains constrained, due to the limited number of scene datasets, the complexity of 3D scenes themselves, and the difficulty of generating consistent multi-view images. To address these issues, we first establish a large-scale panoramic video-text dataset containing millions of consecutive panoramic keyframes with corresponding panoramic depths, camera poses, and text descriptions. Then, we propose a novel text-driven panoramic generation framework, termed DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation. Specifically, benefiting from the powerful generative capabilities of stable diffusion, we fine-tune a single-view text-to-panorama diffusion model with LoRA on the established panoramic video-text dataset. We further design a spherical epipolar-aware multi-view diffusion model to ensure the multi-view consistency of the generated panoramic images. Extensive experiments demonstrate that DiffPano can generate scalable, consistent, and diverse panoramic images with given unseen text descriptions and camera poses.

Related papers

DreamCube: 3D Panorama Generation via Multi-plane Synchronization [17.690754213112108]
3D panorama synthesis is a promising yet challenging task that demands high-quality and diverse visual appearance and geometry of the generated omnidirectional content.<n>Existing methods leverage rich image priors from pre-trained 2D foundation models to circumvent the scarcity of 3D panoramic data.<n>In this work, we demonstrate that by applying multi-plane synchronization to the operators from 2D foundation models, their capabilities can be seamlessly extended to the omnidirectional domain.
arXiv Detail & Related papers (2025-06-20T17:55:06Z)
PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms [41.92179513409301]
Existing panoramic video generation models struggle to leverage pre-trained generative priors from conventional text-to-video models for high-quality panoramic videos.<n>In this paper, we introduce PanoWan to effectively lift pre-trained text-to-video models to the panoramic domain, equipped with minimal modules.<n>To provide sufficient panoramic videos for learning these lifted representations, we contribute PanoVid, a high-quality panoramic video dataset with captions and diverse scenarios.
arXiv Detail & Related papers (2025-05-28T06:24:21Z)
VidPanos: Generative Panoramic Videos from Casual Panning Videos [73.77443496436749]
Panoramic image stitching provides a unified, wide-angle view of a scene that extends beyond the camera's field of view. We present a method for synthesizing a panoramic video from a casually-captured panning video. Our system can create video panoramas for a range of in-the-wild scenes including people, vehicles, and flowing water.
arXiv Detail & Related papers (2024-10-17T17:53:24Z)
SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting [53.32467009064287]
We propose a text-driven 3D-consistent scene generation model: SceneDreamer360. Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation. Our experiments demonstrate that SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt.
arXiv Detail & Related papers (2024-08-25T02:56:26Z)
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation [105.52153675890408]
3D immersive scene generation is a challenging yet critical task in computer vision and graphics. LayerPano3D is a novel framework for full-view, explorable panoramic 3D scene generation from a single text prompt.
arXiv Detail & Related papers (2024-08-23T17:50:23Z)
Pano2Room: Novel View Synthesis from a Single Indoor Panorama [20.262621556667852]
Pano2Room is designed to automatically reconstruct high-quality 3D indoor scenes from a single panoramic image. The key idea is to initially construct a preliminary mesh from the input panorama, and iteratively refine this mesh using a panoramic RGBD inpainter. The refined mesh is converted into a 3D Gaussian Splatting field and trained with the collected pseudo novel views.
arXiv Detail & Related papers (2024-08-21T08:19:12Z)
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions [31.342899807980654]
3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry. We introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene. We then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes.
arXiv Detail & Related papers (2024-07-21T14:52:51Z)
Taming Stable Diffusion for Text to 360° Panorama Image Generation [74.69314801406763]
We introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt. We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process.
arXiv Detail & Related papers (2024-04-11T17:46:14Z)
PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas [54.4948540627471]
We propose PanoGRF, Generalizable Spherical Radiance Fields for Wide-baseline Panoramas. Unlike generalizable radiance fields trained on perspective images, PanoGRF avoids the information loss from panorama-to-perspective conversion. Results on multiple panoramic datasets demonstrate that PanoGRF significantly outperforms state-of-the-art generalizable view synthesis methods.
arXiv Detail & Related papers (2023-06-02T13:35:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.