Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2310.18840v2
- Date: Tue, 7 Nov 2023 23:08:06 GMT
- Title: Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models
- Authors: Hai Wang, Xiaoyu Xiang, Yuchen Fan, Jing-Hao Xue
- Abstract summary: We propose an approach that focuses on the customization of 360-degree panoramas using a T2I diffusion model.
To achieve this, we curate a paired image-text dataset specifically designed for the task and employ it to fine-tune a pre-trained T2I diffusion model with LoRA.
We propose a method called StitchDiffusion to ensure continuity between the leftmost and rightmost sides of the synthesized images.
- Score: 38.70079108858637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized text-to-image (T2I) synthesis based on diffusion models has
attracted significant attention in recent research. However, existing methods
primarily concentrate on customizing subjects or styles, neglecting the
exploration of global geometry. In this study, we propose an approach that
focuses on the customization of 360-degree panoramas, which inherently possess
global geometric properties, using a T2I diffusion model. To achieve this, we
curate a paired image-text dataset specifically designed for the task and
subsequently employ it to fine-tune a pre-trained T2I diffusion model with
LoRA. Nevertheless, the fine-tuned model alone does not ensure the continuity
between the leftmost and rightmost sides of the synthesized images, a crucial
characteristic of 360-degree panoramas. To address this issue, we propose a
method called StitchDiffusion. Specifically, we perform pre-denoising
operations twice at each time step of the denoising process on the stitch block
consisting of the leftmost and rightmost image regions. Furthermore, a global
cropping is adopted to synthesize seamless 360-degree panoramas. Experimental
results demonstrate the effectiveness of our customized model combined with the
proposed StitchDiffusion in generating high-quality 360-degree panoramic
images. Moreover, our customized model exhibits exceptional generalization
ability in producing scenes unseen in the fine-tuning dataset. Code is
available at https://github.com/littlewhitesea/StitchDiffusion.
Related papers
- CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [59.257513664564996]
We introduce a novel method for generating 360deg panoramas from text prompts or images.
We employ multi-view diffusion models to jointly synthesize the six faces of a cubemap.
Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set.
arXiv Detail & Related papers (2025-01-28T18:59:49Z) - StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces [11.517082612850443]
We propose a method for generating images in arbitrary spaces using a pretrained image diffusion model.
The zero-shot method combines the strengths of both image conditioning and 3D mesh-based methods.
arXiv Detail & Related papers (2025-01-26T08:22:44Z) - Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion [63.81544586407943]
Single-image 3D portrait generation methods typically employ 2D diffusion models to provide multi-view knowledge, which is then distilled into 3D representations.
We propose a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits.
Experiments demonstrate that our method can produce 3D portraits with accurate geometry and rich details from a single image.
arXiv Detail & Related papers (2024-11-15T17:19:18Z) - Taming Stable Diffusion for Text to 360° Panorama Image Generation [74.69314801406763]
We introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt.
We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process.
arXiv Detail & Related papers (2024-04-11T17:46:14Z) - Optimized View and Geometry Distillation from Multi-view Diffuser [20.47237377203664]
We introduce an Unbiased Score Distillation (USD) that utilizes unconditioned noises from a 2D diffusion model.
We develop a two-step specialization process of a 2D diffusion model, which is adept at conducting object-specific denoising.
Finally, we recover faithful geometry and texture directly from the refined multi-view images.
arXiv Detail & Related papers (2023-12-11T08:22:24Z) - Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models [13.019535928387702]
This paper presents Progressive Conditional Diffusion Models (PCDMs) that incrementally bridge the gap between person images under the target and source poses through three stages.
Both qualitative and quantitative results demonstrate the consistency and photorealism of our proposed PCDMs under challenging scenarios.
arXiv Detail & Related papers (2023-10-10T05:13:17Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - OPDN: Omnidirectional Position-aware Deformable Network for
Omnidirectional Image Super-Resolution [18.138867445188293]
We propose a two-stage framework for 360deg omnidirectional image superresolution.
Our proposed method achieves superior performance and wins the NTIRE 2023 challenge of 360deg omnidirectional image super-resolution.
arXiv Detail & Related papers (2023-04-26T11:47:40Z) - Enhancement of Novel View Synthesis Using Omnidirectional Image
Completion [61.78187618370681]
We present a method for synthesizing novel views from a single 360-degree RGB-D image based on the neural radiance field (NeRF)
Experiments demonstrated that the proposed method can synthesize plausible novel views while preserving the features of the scene for both artificial and real-world data.
arXiv Detail & Related papers (2022-03-18T13:49:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.