Related papers: Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models

Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models

URL: http://arxiv.org/abs/2310.18840v2
Date: Tue, 7 Nov 2023 23:08:06 GMT
Title: Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models
Authors: Hai Wang, Xiaoyu Xiang, Yuchen Fan, Jing-Hao Xue
Abstract summary: We propose an approach that focuses on the customization of 360-degree panoramas using a T2I diffusion model. To achieve this, we curate a paired image-text dataset specifically designed for the task and employ it to fine-tune a pre-trained T2I diffusion model with LoRA. We propose a method called StitchDiffusion to ensure continuity between the leftmost and rightmost sides of the synthesized images.
Score: 38.70079108858637
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalized text-to-image (T2I) synthesis based on diffusion models has attracted significant attention in recent research. However, existing methods primarily concentrate on customizing subjects or styles, neglecting the exploration of global geometry. In this study, we propose an approach that focuses on the customization of 360-degree panoramas, which inherently possess global geometric properties, using a T2I diffusion model. To achieve this, we curate a paired image-text dataset specifically designed for the task and subsequently employ it to fine-tune a pre-trained T2I diffusion model with LoRA. Nevertheless, the fine-tuned model alone does not ensure the continuity between the leftmost and rightmost sides of the synthesized images, a crucial characteristic of 360-degree panoramas. To address this issue, we propose a method called StitchDiffusion. Specifically, we perform pre-denoising operations twice at each time step of the denoising process on the stitch block consisting of the leftmost and rightmost image regions. Furthermore, a global cropping is adopted to synthesize seamless 360-degree panoramas. Experimental results demonstrate the effectiveness of our customized model combined with the proposed StitchDiffusion in generating high-quality 360-degree panoramic images. Moreover, our customized model exhibits exceptional generalization ability in producing scenes unseen in the fine-tuning dataset. Code is available at https://github.com/littlewhitesea/StitchDiffusion.

Related papers

RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors [13.883695200241524]
RI3D is a novel approach that harnesses the power of diffusion models to reconstruct high-quality novel views given a sparse set of input images. Our key contribution is separating the view synthesis process into two tasks of reconstructing visible regions and hallucinating missing regions. We produce results with detailed textures in both visible and missing regions that outperform state-of-the-art approaches on a diverse set of scenes.
arXiv Detail & Related papers (2025-03-13T20:16:58Z)
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [59.257513664564996]
We introduce a novel method for generating 360deg panoramas from text prompts or images. We employ multi-view diffusion models to jointly synthesize the six faces of a cubemap. Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set.
arXiv Detail & Related papers (2025-01-28T18:59:49Z)
StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces [11.517082612850443]
We propose a method for generating images in arbitrary spaces using a pretrained image diffusion model. The zero-shot method combines the strengths of both image conditioning and 3D mesh-based methods.
arXiv Detail & Related papers (2025-01-26T08:22:44Z)
Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion [63.81544586407943]
Single-image 3D portrait generation methods typically employ 2D diffusion models to provide multi-view knowledge, which is then distilled into 3D representations. We propose a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits. Experiments demonstrate that our method can produce 3D portraits with accurate geometry and rich details from a single image.
arXiv Detail & Related papers (2024-11-15T17:19:18Z)
Taming Stable Diffusion for Text to 360° Panorama Image Generation [74.69314801406763]
We introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt. We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process.
arXiv Detail & Related papers (2024-04-11T17:46:14Z)
Direct Consistency Optimization for Compositional Text-to-Image Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency. We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z)
Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior [33.45375100074168]
We present a novel two-stage approach that fully utilizes the information provided by the reference image to establish a customized knowledge prior for image-to-3D generation. Experiments showcase the superiority of our method, Customize-It-3D, outperforming previous works by a substantial margin.
arXiv Detail & Related papers (2023-12-15T19:07:51Z)
Optimized View and Geometry Distillation from Multi-view Diffuser [20.47237377203664]
We introduce an Unbiased Score Distillation (USD) that utilizes unconditioned noises from a 2D diffusion model. We develop a two-step specialization process of a 2D diffusion model, which is adept at conducting object-specific denoising. Finally, we recover faithful geometry and texture directly from the refined multi-view images.
arXiv Detail & Related papers (2023-12-11T08:22:24Z)
Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models [13.019535928387702]
This paper presents Progressive Conditional Diffusion Models (PCDMs) that incrementally bridge the gap between person images under the target and source poses through three stages. Both qualitative and quantitative results demonstrate the consistency and photorealism of our proposed PCDMs under challenging scenarios.
arXiv Detail & Related papers (2023-10-10T05:13:17Z)
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision. By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images. We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z)
OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution [18.138867445188293]
We propose a two-stage framework for 360deg omnidirectional image superresolution. Our proposed method achieves superior performance and wins the NTIRE 2023 challenge of 360deg omnidirectional image super-resolution.
arXiv Detail & Related papers (2023-04-26T11:47:40Z)
Enhancement of Novel View Synthesis Using Omnidirectional Image Completion [61.78187618370681]
We present a method for synthesizing novel views from a single 360-degree RGB-D image based on the neural radiance field (NeRF) Experiments demonstrated that the proposed method can synthesize plausible novel views while preserving the features of the scene for both artificial and real-world data.
arXiv Detail & Related papers (2022-03-18T13:49:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.