Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas
- URL: http://arxiv.org/abs/2408.15660v1
- Date: Wed, 28 Aug 2024 09:22:32 GMT
- Title: Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas
- Authors: Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara,
- Abstract summary: We propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting.
Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space.
Our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence.
- Score: 33.334956022229846
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have become the State-of-the-Art for text-to-image generation, and increasing research effort has been dedicated to adapting the inference process of pretrained diffusion models to achieve zero-shot capabilities. An example is the generation of panorama images, which has been tackled in recent works by combining independent diffusion paths over overlapping latent features, which is referred to as joint diffusion, obtaining perceptually aligned panoramas. However, these methods often yield semantically incoherent outputs and trade-off diversity for uniformity. To overcome this limitation, we propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting to improve the perceptual and semantical coherence of the generated panorama images. Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space. Extensive quantitative and qualitative experimental analysis, together with a user study, demonstrate that our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence. We release the code at https://github.com/aimagelab/MAD.
Related papers
- Diffusion Prism: Enhancing Diversity and Morphology Consistency in Mask-to-Image Diffusion [4.0301593672451]
Diffusion Prism is a training-free framework that transforms binary masks into realistic and diverse samples.
We explore that a small amount of artificial noise will significantly assist the image-denoising process.
arXiv Detail & Related papers (2025-01-01T20:04:25Z) - FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution [48.88184541515326]
We propose a simple and effective method, named FaithDiff, to fully harness the power of latent diffusion models (LDMs) for faithful image SR.
In contrast to existing diffusion-based SR methods that freeze the diffusion model pre-trained on high-quality images, we propose to unleash the diffusion prior to identify useful information and recover faithful structures.
arXiv Detail & Related papers (2024-11-27T23:58:03Z) - Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters.
We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z) - MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration [7.087475633143941]
MM-Diff is a tuning-free image personalization framework capable of generating high-fidelity images of both single and multiple subjects in seconds.
MM-Diff employs a vision encoder to transform the input image into CLS and patch embeddings.
CLS embeddings are used on the one hand to augment the text embeddings, and on the other hand together with patch embeddings to derive a small number of detail-rich subject embeddings.
arXiv Detail & Related papers (2024-03-22T09:32:31Z) - ViewFusion: Towards Multi-View Consistency via Interpolated Denoising [48.02829400913904]
We introduce ViewFusion, a training-free algorithm that can be seamlessly integrated into existing pre-trained diffusion models.
Our approach adopts an auto-regressive method that implicitly leverages previously generated views as context for the next view generation.
Our framework successfully extends single-view conditioned models to work in multiple-view conditional settings without any additional fine-tuning.
arXiv Detail & Related papers (2024-02-29T04:21:38Z) - On the Multi-modal Vulnerability of Diffusion Models [56.08923332178462]
We propose MMP-Attack to manipulate the generation results of diffusion models by appending a specific suffix to the original prompt.
Our goal is to induce diffusion models to generate a specific object while simultaneously eliminating the original object.
arXiv Detail & Related papers (2024-02-02T12:39:49Z) - DiffMorpher: Unleashing the Capability of Diffusion Models for Image
Morphing [28.593023489682654]
We present DiffMorpher, the first approach enabling smooth and natural image morphing using diffusion models.
Our key idea is to capture the semantics of the two images by fitting two LoRAs to them respectively, and interpolate between both the LoRA parameters and the latent noises to ensure a smooth semantic transition.
In addition, we propose an attention and injection technique and a new sampling schedule to further enhance the smoothness between consecutive images.
arXiv Detail & Related papers (2023-12-12T16:28:08Z) - Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.