Delta Denoising Score
- URL: http://arxiv.org/abs/2304.07090v1
- Date: Fri, 14 Apr 2023 12:22:41 GMT
- Title: Delta Denoising Score
- Authors: Amir Hertz, Kfir Aberman, Daniel Cohen-Or
- Abstract summary: We introduce Delta Denoising Score (DDS), a novel scoring function for text-based image editing.
It guides minimal modifications of an input image towards the content described in a target prompt.
- Score: 51.98288453616375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Delta Denoising Score (DDS), a novel scoring function for
text-based image editing that guides minimal modifications of an input image
towards the content described in a target prompt. DDS leverages the rich
generative prior of text-to-image diffusion models and can be used as a loss
term in an optimization problem to steer an image towards a desired direction
dictated by a text. DDS utilizes the Score Distillation Sampling (SDS)
mechanism for the purpose of image editing. We show that using only SDS often
produces non-detailed and blurry outputs due to noisy gradients. To address
this issue, DDS uses a prompt that matches the input image to identify and
remove undesired erroneous directions of SDS. Our key premise is that SDS
should be zero when calculated on pairs of matched prompts and images, meaning
that if the score is non-zero, its gradients can be attributed to the erroneous
component of SDS. Our analysis demonstrates the competence of DDS for text
based image-to-image translation. We further show that DDS can be used to train
an effective zero-shot image translation model. Experimental results indicate
that DDS outperforms existing methods in terms of stability and quality,
highlighting its potential for real-world applications in text-based image
editing.
Related papers
- Semantic Score Distillation Sampling for Compositional Text-to-3D Generation [28.88237230872795]
Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research.
We introduce a novel SDS approach, designed to improve the expressiveness and accuracy of compositional text-to-3D generation.
Our approach integrates new semantic embeddings that maintain consistency across different rendering views.
By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-11T17:26:00Z) - Rethinking Score Distillation as a Bridge Between Image Distributions [97.27476302077545]
We show that our method seeks to transport corrupted images (source) to the natural image distribution (target)
Our method can be easily applied across many domains, matching or beating the performance of specialized methods.
We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real.
arXiv Detail & Related papers (2024-06-13T17:59:58Z) - InstructHumans: Editing Animated 3D Human Textures with Instructions [40.012406098563204]
We present InstructHumans, a novel framework for instruction-driven 3D human texture editing.
InstructHumans significantly outperforms existing 3D editing methods, consistent with the initial avatar.
arXiv Detail & Related papers (2024-04-05T11:45:03Z) - Score Distillation Sampling with Learned Manifold Corrective [36.963929141091455]
We decompose the loss into different factors and isolate the component responsible for noisy gradients.
In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects such as oversaturation or repeated detail.
We train a shallow network mimicking the timestep-dependent frequency bias of the image diffusion model in order to effectively factor it out.
arXiv Detail & Related papers (2024-01-10T17:51:46Z) - MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image
Translation by Prompts Redescription and Beyond [57.14128305383768]
We propose a prompt redescription strategy to realize a mirror effect between the source and reconstructed image in the diffusion model (MirrorDiffusion)
MirrorDiffusion achieves superior performance over the state-of-the-art methods on zero-shot image translation benchmarks.
arXiv Detail & Related papers (2024-01-06T14:12:16Z) - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z) - Noise-Free Score Distillation [78.79226724549456]
Noise-Free Score Distillation (NFSD) process requires minimal modifications to the original SDS framework.
We achieve more effective distillation of pre-trained text-to-image diffusion models while using a nominal CFG scale.
arXiv Detail & Related papers (2023-10-26T17:12:26Z) - SDEdit: Image Synthesis and Editing with Stochastic Differential
Equations [113.35735935347465]
We introduce Differential Editing (SDEdit), based on a recent generative model using differential equations (SDEs)
Given an input image with user edits, we first add noise to the input according to an SDE, and subsequently denoise it by simulating the reverse SDE to gradually increase its likelihood under the prior.
Our method does not require task-specific loss function designs, which are critical components for recent image editing methods based on GAN inversions.
arXiv Detail & Related papers (2021-08-02T17:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.