Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
- URL: http://arxiv.org/abs/2311.18608v2
- Date: Mon, 1 Apr 2024 11:44:25 GMT
- Title: Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
- Authors: Hyelin Nam, Gihyun Kwon, Geon Yeong Park, Jong Chul Ye,
- Abstract summary: We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
- Score: 58.48890547818074
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the remarkable advent of text-to-image diffusion models, image editing methods have become more diverse and continue to evolve. A promising recent approach in this realm is Delta Denoising Score (DDS) - an image editing technique based on Score Distillation Sampling (SDS) framework that leverages the rich generative prior of text-to-image diffusion models. However, relying solely on the difference between scoring functions is insufficient for preserving specific structural elements from the original image, a crucial aspect of image editing. To address this, here we present an embarrassingly simple yet very powerful modification of DDS, called Contrastive Denoising Score (CDS), for latent diffusion models (LDM). Inspired by the similarities and differences between DDS and the contrastive learning for unpaired image-to-image translation(CUT), we introduce a straightforward approach using CUT loss within the DDS framework. Rather than employing auxiliary networks as in the original CUT approach, we leverage the intermediate features of LDM, specifically those from the self-attention layers, which possesses rich spatial information. Our approach enables zero-shot image-to-image translation and neural radiance field (NeRF) editing, achieving structural correspondence between the input and output while maintaining content controllability. Qualitative results and comparisons demonstrates the effectiveness of our proposed method. Project page: https://hyelinnam.github.io/CDS/
Related papers
- Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.
Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)
We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.
Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - Preserving Identity with Variational Score for General-purpose 3D Editing [48.314327790451856]
Piva is a novel optimization-based method for editing images and 3D models based on diffusion models.
We pinpoint the limitations in 2D and 3D editing, which causes detail loss and oversaturation.
We propose an additional score distillation term that enforces identity preservation.
arXiv Detail & Related papers (2024-06-13T09:32:40Z) - Score Distillation Sampling with Learned Manifold Corrective [36.963929141091455]
We decompose the loss into different factors and isolate the component responsible for noisy gradients.
In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects such as oversaturation or repeated detail.
We train a shallow network mimicking the timestep-dependent frequency bias of the image diffusion model in order to effectively factor it out.
arXiv Detail & Related papers (2024-01-10T17:51:46Z) - MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image
Translation by Prompts Redescription and Beyond [57.14128305383768]
We propose a prompt redescription strategy to realize a mirror effect between the source and reconstructed image in the diffusion model (MirrorDiffusion)
MirrorDiffusion achieves superior performance over the state-of-the-art methods on zero-shot image translation benchmarks.
arXiv Detail & Related papers (2024-01-06T14:12:16Z) - Delta Denoising Score [51.98288453616375]
We introduce Delta Denoising Score (DDS), a novel scoring function for text-based image editing.
It guides minimal modifications of an input image towards the content described in a target prompt.
arXiv Detail & Related papers (2023-04-14T12:22:41Z) - SinDDM: A Single Image Denoising Diffusion Model [28.51951207066209]
We introduce a framework for training a Denoising diffusion model on a single image.
Our method, which we coin SinDDM, learns the internal statistics of the training image by using a multi-scale diffusion process.
It is applicable in a wide array of tasks, including style transfer and harmonization.
arXiv Detail & Related papers (2022-11-29T20:44:25Z) - EDICT: Exact Diffusion Inversion via Coupled Transformations [13.996171129586731]
Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem.
We propose Exact Diffusion Inversion via Coupled Transformations (EDICT), an inversion method that draws inspiration from affine coupling layers.
EDICT enables mathematically exact inversion of real and model-generated images by maintaining two coupled noise vectors.
arXiv Detail & Related papers (2022-11-22T18:02:49Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.