MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image
Translation by Prompts Redescription and Beyond
- URL: http://arxiv.org/abs/2401.03221v1
- Date: Sat, 6 Jan 2024 14:12:16 GMT
- Title: MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image
Translation by Prompts Redescription and Beyond
- Authors: Yupei Lin, Xiaoyu Xian, Yukai Shi, Liang Lin
- Abstract summary: We propose a prompt redescription strategy to realize a mirror effect between the source and reconstructed image in the diffusion model (MirrorDiffusion)
MirrorDiffusion achieves superior performance over the state-of-the-art methods on zero-shot image translation benchmarks.
- Score: 57.14128305383768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, text-to-image diffusion models become a new paradigm in image
processing fields, including content generation, image restoration and
image-to-image translation. Given a target prompt, Denoising Diffusion
Probabilistic Models (DDPM) are able to generate realistic yet eligible images.
With this appealing property, the image translation task has the potential to
be free from target image samples for supervision. By using a target text
prompt for domain adaption, the diffusion model is able to implement zero-shot
image-to-image translation advantageously. However, the sampling and inversion
processes of DDPM are stochastic, and thus the inversion process often fail to
reconstruct the input content. Specifically, the displacement effect will
gradually accumulated during the diffusion and inversion processes, which led
to the reconstructed results deviating from the source domain. To make
reconstruction explicit, we propose a prompt redescription strategy to realize
a mirror effect between the source and reconstructed image in the diffusion
model (MirrorDiffusion). More specifically, a prompt redescription mechanism is
investigated to align the text prompts with latent code at each time step of
the Denoising Diffusion Implicit Models (DDIM) inversion to pursue a
structure-preserving reconstruction. With the revised DDIM inversion,
MirrorDiffusion is able to realize accurate zero-shot image translation by
editing optimized text prompts and latent code. Extensive experiments
demonstrate that MirrorDiffusion achieves superior performance over the
state-of-the-art methods on zero-shot image translation benchmarks by clear
margins and practical model stability.
Related papers
- ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing [20.46262679357339]
Diffusion models (DMs) have been successfully applied to real image editing.
Recent popular DMs often rely on the assumption of local linearization.
ERDDCI uses the new Dual-Chain Inversion (DCI) for joint inference to derive an exact reversible diffusion process.
arXiv Detail & Related papers (2024-10-18T07:52:03Z) - Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps [24.372192691537897]
This work aims to enrich distilled text-to-image diffusion models with the ability to effectively encode real images into their latent space.
We introduce invertible Consistency Distillation (iCD), a generalized consistency distillation framework that facilitates both high-quality image synthesis and accurate image encoding in only 3-4 inference steps.
We demonstrate that iCD equipped with dynamic guidance may serve as a highly effective tool for zero-shot text-guided image editing, competing with more expensive state-of-the-art alternatives.
arXiv Detail & Related papers (2024-06-20T17:49:11Z) - ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations.
We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z) - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z) - Self-correcting LLM-controlled Diffusion Models [83.26605445217334]
We introduce Self-correcting LLM-controlled Diffusion (SLD)
SLD is a framework that generates an image from the input prompt, assesses its alignment with the prompt, and performs self-corrections on the inaccuracies in the generated image.
Our approach can rectify a majority of incorrect generations, particularly in generative numeracy, attribute binding, and spatial relationships.
arXiv Detail & Related papers (2023-11-27T18:56:37Z) - Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise [34.65659277870287]
Research on denoising diffusion models has expanded its application to the field of image restoration.
We propose Resfusion, a framework that incorporates the residual term into the diffusion forward process.
We show that Resfusion exhibits competitive performance on ISTD dataset, LOL dataset and Raindrop dataset with only five sampling steps.
arXiv Detail & Related papers (2023-11-25T02:09:38Z) - Effective Real Image Editing with Accelerated Iterative Diffusion
Inversion [6.335245465042035]
It is still challenging to edit and manipulate natural images with modern generative models.
Existing approaches that have tackled the problem of inversion stability often incur in significant trade-offs in computational efficiency.
We propose an Accelerated Iterative Diffusion Inversion method, dubbed AIDI, that significantly improves reconstruction accuracy with minimal additional overhead in space and time complexity.
arXiv Detail & Related papers (2023-09-10T01:23:05Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - EDICT: Exact Diffusion Inversion via Coupled Transformations [13.996171129586731]
Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem.
We propose Exact Diffusion Inversion via Coupled Transformations (EDICT), an inversion method that draws inspiration from affine coupling layers.
EDICT enables mathematically exact inversion of real and model-generated images by maintaining two coupled noise vectors.
arXiv Detail & Related papers (2022-11-22T18:02:49Z) - Diffusion-based Image Translation using Disentangled Style and Content
Representation [51.188396199083336]
Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer.
It is often difficult to maintain the original content of the image during the reverse diffusion.
We present a novel diffusion-based unsupervised image translation method using disentangled style and content representation.
Our experimental results show that the proposed method outperforms state-of-the-art baseline models in both text-guided and image-guided translation tasks.
arXiv Detail & Related papers (2022-09-30T06:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.