What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?
- URL: http://arxiv.org/abs/2505.22129v1
- Date: Wed, 28 May 2025 08:54:04 GMT
- Title: What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?
- Authors: Jinhong Ni, Chang-Bin Zhang, Qiang Zhang, Jing Zhang,
- Abstract summary: Previous work has demonstrated the feasibility of using conventional low-rank adaptation techniques to generate panoramic images.<n>We introduce a simple framework called UniPano, with the objective of establishing an elegant baseline for future research.
- Score: 16.01049610453117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent prosperity of text-to-image diffusion models, e.g. Stable Diffusion, has stimulated research to adapt them to 360-degree panorama generation. Prior work has demonstrated the feasibility of using conventional low-rank adaptation techniques on pre-trained diffusion models to generate panoramic images. However, the substantial domain gap between perspective and panoramic images raises questions about the underlying mechanisms enabling this empirical success. We hypothesize and examine that the trainable counterparts exhibit distinct behaviors when fine-tuned on panoramic data, and such an adaptation conceals some intrinsic mechanism to leverage the prior knowledge within the pre-trained diffusion models. Our analysis reveals the following: 1) the query and key matrices in the attention modules are responsible for common information that can be shared between the panoramic and perspective domains, thus are less relevant to panorama generation; and 2) the value and output weight matrices specialize in adapting pre-trained knowledge to the panoramic domain, playing a more critical role during fine-tuning for panorama generation. We empirically verify these insights by introducing a simple framework called UniPano, with the objective of establishing an elegant baseline for future research. UniPano not only outperforms existing methods but also significantly reduces memory usage and training time compared to prior dual-branch approaches, making it scalable for end-to-end panorama generation with higher resolution. The code will be released.
Related papers
- ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models [52.87334248847314]
We propose a novel framework utilizing pretrained perspective video models for generating panoramic videos.<n>Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously.<n>Our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.
arXiv Detail & Related papers (2025-06-30T04:33:34Z) - Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas [33.334956022229846]
We propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting.
Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space.
Our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence.
arXiv Detail & Related papers (2024-08-28T09:22:32Z) - 360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes [15.367186190755003]
We address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation.
360SFUDA++ effectively extracts knowledge from the source pinhole model with only unlabeled panoramic images.
arXiv Detail & Related papers (2024-04-25T10:52:08Z) - Taming Stable Diffusion for Text to 360° Panorama Image Generation [74.69314801406763]
We introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt.
We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process.
arXiv Detail & Related papers (2024-04-11T17:46:14Z) - Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR)
We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z) - On the Multi-modal Vulnerability of Diffusion Models [56.08923332178462]
We propose MMP-Attack to manipulate the generation results of diffusion models by appending a specific suffix to the original prompt.<n>Our goal is to induce diffusion models to generate a specific object while simultaneously eliminating the original object.
arXiv Detail & Related papers (2024-02-02T12:39:49Z) - Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for
Mobile Agents via Unsupervised Contrastive Learning [93.6645991946674]
We introduce panoramic panoptic segmentation, as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to a mobile agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2022-06-21T20:07:15Z) - Unsupervised Cycle-consistent Generative Adversarial Networks for
Pan-sharpening [41.68141846006704]
We propose an unsupervised generative adversarial framework that learns from the full-scale images without the ground truths to alleviate this problem.
We extract the modality-specific features from the PAN and MS images with a two-stream generator, perform fusion in the feature domain, and then reconstruct the pan-sharpened images.
Results demonstrate that the proposed method can greatly improve the pan-sharpening performance on the full-scale images.
arXiv Detail & Related papers (2021-09-20T09:43:24Z) - Panoramic Panoptic Segmentation: Towards Complete Surrounding
Understanding via Unsupervised Contrastive Learning [97.37544023666833]
We introduce panoramic panoptic segmentation as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to the agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2021-03-01T09:37:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.