Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2310.18840v2
- Date: Tue, 7 Nov 2023 23:08:06 GMT
- Title: Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models
- Authors: Hai Wang, Xiaoyu Xiang, Yuchen Fan, Jing-Hao Xue
- Abstract summary: We propose an approach that focuses on the customization of 360-degree panoramas using a T2I diffusion model.
To achieve this, we curate a paired image-text dataset specifically designed for the task and employ it to fine-tune a pre-trained T2I diffusion model with LoRA.
We propose a method called StitchDiffusion to ensure continuity between the leftmost and rightmost sides of the synthesized images.
- Score: 38.70079108858637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized text-to-image (T2I) synthesis based on diffusion models has
attracted significant attention in recent research. However, existing methods
primarily concentrate on customizing subjects or styles, neglecting the
exploration of global geometry. In this study, we propose an approach that
focuses on the customization of 360-degree panoramas, which inherently possess
global geometric properties, using a T2I diffusion model. To achieve this, we
curate a paired image-text dataset specifically designed for the task and
subsequently employ it to fine-tune a pre-trained T2I diffusion model with
LoRA. Nevertheless, the fine-tuned model alone does not ensure the continuity
between the leftmost and rightmost sides of the synthesized images, a crucial
characteristic of 360-degree panoramas. To address this issue, we propose a
method called StitchDiffusion. Specifically, we perform pre-denoising
operations twice at each time step of the denoising process on the stitch block
consisting of the leftmost and rightmost image regions. Furthermore, a global
cropping is adopted to synthesize seamless 360-degree panoramas. Experimental
results demonstrate the effectiveness of our customized model combined with the
proposed StitchDiffusion in generating high-quality 360-degree panoramic
images. Moreover, our customized model exhibits exceptional generalization
ability in producing scenes unseen in the fine-tuning dataset. Code is
available at https://github.com/littlewhitesea/StitchDiffusion.
Related papers
- ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - Taming Stable Diffusion for Text to 360° Panorama Image Generation [74.69314801406763]
We introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt.
We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process.
arXiv Detail & Related papers (2024-04-11T17:46:14Z) - Direct Consistency Optimization for Compositional Text-to-Image
Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency.
We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z) - Customize-It-3D: High-Quality 3D Creation from A Single Image Using
Subject-Specific Knowledge Prior [33.45375100074168]
We present a novel two-stage approach that fully utilizes the information provided by the reference image to establish a customized knowledge prior for image-to-3D generation.
Experiments showcase the superiority of our method, Customize-It-3D, outperforming previous works by a substantial margin.
arXiv Detail & Related papers (2023-12-15T19:07:51Z) - Optimized View and Geometry Distillation from Multi-view Diffuser [20.47237377203664]
We introduce an Unbiased Score Distillation (USD) that utilizes unconditioned noises from a 2D diffusion model.
We develop a two-step specialization process of a 2D diffusion model, which is adept at conducting object-specific denoising.
Finally, we recover faithful geometry and texture directly from the refined multi-view images.
arXiv Detail & Related papers (2023-12-11T08:22:24Z) - Advancing Pose-Guided Image Synthesis with Progressive Conditional
Diffusion Models [13.795706255966259]
This paper presents Progressive Conditional Diffusion Models (PCDMs) that incrementally bridge the gap between person images under the target and source poses through three stages.
Both qualitative and quantitative results demonstrate the consistency and photorealism of our proposed PCDMs under challenging scenarios.
arXiv Detail & Related papers (2023-10-10T05:13:17Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - OPDN: Omnidirectional Position-aware Deformable Network for
Omnidirectional Image Super-Resolution [18.138867445188293]
We propose a two-stage framework for 360deg omnidirectional image superresolution.
Our proposed method achieves superior performance and wins the NTIRE 2023 challenge of 360deg omnidirectional image super-resolution.
arXiv Detail & Related papers (2023-04-26T11:47:40Z) - Enhancement of Novel View Synthesis Using Omnidirectional Image
Completion [61.78187618370681]
We present a method for synthesizing novel views from a single 360-degree RGB-D image based on the neural radiance field (NeRF)
Experiments demonstrated that the proposed method can synthesize plausible novel views while preserving the features of the scene for both artificial and real-world data.
arXiv Detail & Related papers (2022-03-18T13:49:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.