Real-World Image Variation by Aligning Diffusion Inversion Chain
- URL: http://arxiv.org/abs/2305.18729v3
- Date: Tue, 7 Nov 2023 03:34:26 GMT
- Title: Real-World Image Variation by Aligning Diffusion Inversion Chain
- Authors: Yuechen Zhang, Jinbo Xing, Eric Lo, Jiaya Jia
- Abstract summary: A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
- Score: 53.772004619296794
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent diffusion model advancements have enabled high-fidelity images to be
generated using text prompts. However, a domain gap exists between generated
images and real-world images, which poses a challenge in generating
high-quality variations of real-world images. Our investigation uncovers that
this domain gap originates from a latents' distribution gap in different
diffusion processes. To address this issue, we propose a novel inference
pipeline called Real-world Image Variation by ALignment (RIVAL) that utilizes
diffusion models to generate image variations from a single image exemplar. Our
pipeline enhances the generation quality of image variations by aligning the
image generation process to the source image's inversion chain. Specifically,
we demonstrate that step-wise latent distribution alignment is essential for
generating high-quality variations. To attain this, we design a cross-image
self-attention injection for feature interaction and a step-wise distribution
normalization to align the latent features. Incorporating these alignment
processes into a diffusion model allows RIVAL to generate high-quality image
variations without further parameter optimization. Our experimental results
demonstrate that our proposed approach outperforms existing methods concerning
semantic similarity and perceptual quality. This generalized inference pipeline
can be easily applied to other diffusion-based generation tasks, such as
image-conditioned text-to-image generation and stylization.
Related papers
- Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas [33.334956022229846]
We propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting.
Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space.
Our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence.
arXiv Detail & Related papers (2024-08-28T09:22:32Z) - Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior.
Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs)
We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z) - IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models [24.382275473592046]
We present a diffusion-based image morphing approach with perceptually-uniform sampling (IMPUS)
IMPUS produces smooth, direct and realistic adaptations given an image pair.
arXiv Detail & Related papers (2023-11-12T10:03:32Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z) - Encoding Robustness to Image Style via Adversarial Feature Perturbations [72.81911076841408]
We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce robust models.
Our proposed method, Adversarial Batch Normalization (AdvBN), is a single network layer that generates worst-case feature perturbations during training.
arXiv Detail & Related papers (2020-09-18T17:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.