Score Distillation Sampling with Learned Manifold Corrective
- URL: http://arxiv.org/abs/2401.05293v2
- Date: Thu, 4 Jul 2024 13:21:58 GMT
- Title: Score Distillation Sampling with Learned Manifold Corrective
- Authors: Thiemo Alldieck, Nikos Kolotouros, Cristian Sminchisescu,
- Abstract summary: We decompose the loss into different factors and isolate the component responsible for noisy gradients.
In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects such as oversaturation or repeated detail.
We train a shallow network mimicking the timestep-dependent frequency bias of the image diffusion model in order to effectively factor it out.
- Score: 36.963929141091455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Score Distillation Sampling (SDS) is a recent but already widely popular method that relies on an image diffusion model to control optimization problems using text prompts. In this paper, we conduct an in-depth analysis of the SDS loss function, identify an inherent problem with its formulation, and propose a surprisingly easy but effective fix. Specifically, we decompose the loss into different factors and isolate the component responsible for noisy gradients. In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects such as oversaturation or repeated detail. Instead, we train a shallow network mimicking the timestep-dependent frequency bias of the image diffusion model in order to effectively factor it out. We demonstrate the versatility and the effectiveness of our novel loss formulation through qualitative and quantitative experiments, including optimization-based image synthesis and editing, zero-shot image translation network training, and text-to-3D synthesis.
Related papers
- Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis [22.02829139522153]
We propose an efficient time step sampling method based on an image spectral analysis of the diffusion process.
Instead of the traditional uniform distribution-based time step sampling, we introduce a Beta distribution-like sampling technique.
Our hypothesis is that certain steps exhibit significant changes in image content, while others contribute minimally.
arXiv Detail & Related papers (2024-07-16T20:53:06Z) - Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation [5.234109158596138]
We propose a new training framework for SAR-to-optical image translation.
Our method employs consistency distillation to reduce iterative inference steps and integrates adversarial learning to ensure image clarity and minimize color shifts.
The results demonstrate that our approach significantly improves inference speed by 131 times while maintaining the visual quality of the generated images.
arXiv Detail & Related papers (2024-07-08T16:36:12Z) - Rethinking Score Distillation as a Bridge Between Image Distributions [97.27476302077545]
We show that our method seeks to transport corrupted images (source) to the natural image distribution (target)
Our method can be easily applied across many domains, matching or beating the performance of specialized methods.
We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real.
arXiv Detail & Related papers (2024-06-13T17:59:58Z) - Diffusion Posterior Proximal Sampling for Image Restoration [28.388405376136095]
Diffusion-based image restoration algorithms exploit pre-trained diffusion models to leverage data priors.
These strategies initiate the denoising process with pure white noise and incorporate random noise at each generative step, leading to over-smoothed results.
In this paper, we introduce a refined paradigm for diffusion-based image restoration.
arXiv Detail & Related papers (2024-02-25T04:24:28Z) - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z) - SinSR: Diffusion-Based Image Super-Resolution in a Single Step [119.18813219518042]
Super-resolution (SR) methods based on diffusion models exhibit promising results.
But their practical application is hindered by the substantial number of required inference steps.
We propose a simple yet effective method for achieving single-step SR generation, named SinSR.
arXiv Detail & Related papers (2023-11-23T16:21:29Z) - Noise-Free Score Distillation [78.79226724549456]
Noise-Free Score Distillation (NFSD) process requires minimal modifications to the original SDS framework.
We achieve more effective distillation of pre-trained text-to-image diffusion models while using a nominal CFG scale.
arXiv Detail & Related papers (2023-10-26T17:12:26Z) - Invertible Image Rescaling [118.2653765756915]
We develop an Invertible Rescaling Net (IRN) to produce visually-pleasing low-resolution images.
We capture the distribution of the lost information using a latent variable following a specified distribution in the downscaling process.
arXiv Detail & Related papers (2020-05-12T09:55:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.