Taming Stable Diffusion for Text to 360° Panorama Image Generation
- URL: http://arxiv.org/abs/2404.07949v1
- Date: Thu, 11 Apr 2024 17:46:14 GMT
- Title: Taming Stable Diffusion for Text to 360° Panorama Image Generation
- Authors: Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xiaoshui Huang, Dinh Phung, Wanli Ouyang, Jianfei Cai,
- Abstract summary: We introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt.
We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process.
- Score: 74.69314801406763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models, e.g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts. Yet, the generation of 360-degree panorama images from text remains a challenge, particularly due to the dearth of paired text-panorama data and the domain gap between panorama and perspective images. In this paper, we introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt. We leverage the stable diffusion model as one branch to provide prior knowledge in natural image generation and register it to another panorama branch for holistic image generation. We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process. Our experiments validate that PanFusion surpasses existing methods and, thanks to its dual-branch structure, can integrate additional constraints like room layout for customized panorama outputs. Code is available at https://chengzhag.github.io/publication/panfusion.
Related papers
- DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation.
We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z) - SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting [53.32467009064287]
We propose a text-driven 3D-consistent scene generation model: SceneDreamer360.
Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation.
Our experiments demonstrate that SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt.
arXiv Detail & Related papers (2024-08-25T02:56:26Z) - PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance [37.45462643757252]
PanoFree is a novel method for tuning-free multi-view image generation.
It addresses the key issues of inconsistency and artifacts from error accumulation without the need for fine-tuning.
It demonstrates significant error reduction, improves global consistency, and boosts image quality without extra fine-tuning.
arXiv Detail & Related papers (2024-08-04T22:23:10Z) - OPa-Ma: Text Guided Mamba for 360-degree Image Out-painting [9.870063736691556]
We tackle the recently popular topic of generating 360-degree images given the conventional narrow field of view (NFoV) images.
This task aims to predict the reasonable and consistent surroundings from the NFoV images.
We propose a novel text-guided out-painting framework equipped with a State-Space Model called Mamba.
arXiv Detail & Related papers (2024-07-15T17:23:00Z) - Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models [38.70079108858637]
We propose an approach that focuses on the customization of 360-degree panoramas using a T2I diffusion model.
To achieve this, we curate a paired image-text dataset specifically designed for the task and employ it to fine-tune a pre-trained T2I diffusion model with LoRA.
We propose a method called StitchDiffusion to ensure continuity between the leftmost and rightmost sides of the synthesized images.
arXiv Detail & Related papers (2023-10-28T22:57:24Z) - 360-Degree Panorama Generation from Few Unregistered NFoV Images [16.05306624008911]
360$circ$ panoramas are extensively utilized as environmental light sources in computer graphics.
capturing a 360$circ$ $times$ 180$circ$ panorama poses challenges due to specialized and costly equipment.
We propose a novel pipeline called PanoDiff, which efficiently generates complete 360$circ$ panoramas.
arXiv Detail & Related papers (2023-08-28T16:21:51Z) - LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image
Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation.
Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z) - ProSpect: Prompt Spectrum for Attribute-Aware Personalization of
Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models.
We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information.
We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z) - Cross-View Panorama Image Synthesis [68.35351563852335]
PanoGAN is a novel adversarial feedback GAN framework named.
PanoGAN enables high-quality panorama image generation with more convincing details than state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-22T15:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.