Related papers: Score Distillation Sampling with Learned Manifold Corrective

Score Distillation Sampling with Learned Manifold Corrective

URL: http://arxiv.org/abs/2401.05293v2
Date: Thu, 4 Jul 2024 13:21:58 GMT
Title: Score Distillation Sampling with Learned Manifold Corrective
Authors: Thiemo Alldieck, Nikos Kolotouros, Cristian Sminchisescu,
Abstract summary: We decompose the loss into different factors and isolate the component responsible for noisy gradients. In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects such as oversaturation or repeated detail. We train a shallow network mimicking the timestep-dependent frequency bias of the image diffusion model in order to effectively factor it out.
Score: 36.963929141091455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Score Distillation Sampling (SDS) is a recent but already widely popular method that relies on an image diffusion model to control optimization problems using text prompts. In this paper, we conduct an in-depth analysis of the SDS loss function, identify an inherent problem with its formulation, and propose a surprisingly easy but effective fix. Specifically, we decompose the loss into different factors and isolate the component responsible for noisy gradients. In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects such as oversaturation or repeated detail. Instead, we train a shallow network mimicking the timestep-dependent frequency bias of the image diffusion model in order to effectively factor it out. We demonstrate the versatility and the effectiveness of our novel loss formulation through qualitative and quantitative experiments, including optimization-based image synthesis and editing, zero-shot image translation network training, and text-to-3D synthesis.

Related papers

Diffusion Models for Solving Inverse Problems via Posterior Sampling with Piecewise Guidance [52.705112811734566]
A novel diffusion-based framework is introduced for solving inverse problems using a piecewise guidance scheme.<n>The proposed method is problem-agnostic and readily adaptable to a variety of inverse problems.<n>The framework achieves a reduction in inference time of (25%) for inpainting with both random and center masks, and (23%) and (24%) for (4times) and (8times) super-resolution tasks.
arXiv Detail & Related papers (2025-07-22T19:35:14Z)
Explainable Synthetic Image Detection through Diffusion Timestep Ensembling [30.298198387824275]
We propose a novel synthetic image detection method that directly utilizes features of intermediately noised images by training an ensemble on multiple noised timesteps.<n>To enhance human comprehension, we introduce a metric-grounded explanation generation and refinement module.<n>Our method achieves state-of-the-art performance with 98.91% and 95.89% detection accuracy on regular and challenging samples respectively.
arXiv Detail & Related papers (2025-03-08T13:04:20Z)
One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts. Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation. We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z)
Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation [5.234109158596138]
We propose a new training framework for SAR-to-optical image translation. Our method employs consistency distillation to reduce iterative inference steps and integrates adversarial learning to ensure image clarity and minimize color shifts. The results demonstrate that our approach significantly improves inference speed by 131 times while maintaining the visual quality of the generated images.
arXiv Detail & Related papers (2024-07-08T16:36:12Z)
Diffusion Posterior Proximal Sampling for Image Restoration [27.35952624032734]
We present a refined paradigm for diffusion-based image restoration. Specifically, we opt for a sample consistent with the measurement identity at each generative step. The number of candidate samples used for selection is adaptively determined based on the signal-to-noise ratio of the timestep.
arXiv Detail & Related papers (2024-02-25T04:24:28Z)
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM) Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z)
SinSR: Diffusion-Based Image Super-Resolution in a Single Step [119.18813219518042]
Super-resolution (SR) methods based on diffusion models exhibit promising results. But their practical application is hindered by the substantial number of required inference steps. We propose a simple yet effective method for achieving single-step SR generation, named SinSR.
arXiv Detail & Related papers (2023-11-23T16:21:29Z)
Noise-Free Score Distillation [78.79226724549456]
Noise-Free Score Distillation (NFSD) process requires minimal modifications to the original SDS framework. We achieve more effective distillation of pre-trained text-to-image diffusion models while using a nominal CFG scale.
arXiv Detail & Related papers (2023-10-26T17:12:26Z)
Simultaneous Image-to-Zero and Zero-to-Noise: Diffusion Models with Analytical Image Attenuation [53.04220377034574]
We propose incorporating an analytical image attenuation process into the forward diffusion process for high-quality (un)conditioned image generation. Our method represents the forward image-to-noise mapping as simultaneous textitimage-to-zero mapping and textitzero-to-noise mapping. We have conducted experiments on unconditioned image generation, textite.g., CIFAR-10 and CelebA-HQ-256, and image-conditioned downstream tasks such as super-resolution, saliency detection, edge detection, and image inpainting.
arXiv Detail & Related papers (2023-06-23T18:08:00Z)
Invertible Image Rescaling [118.2653765756915]
We develop an Invertible Rescaling Net (IRN) to produce visually-pleasing low-resolution images. We capture the distribution of the lost information using a latent variable following a specified distribution in the downscaling process.
arXiv Detail & Related papers (2020-05-12T09:55:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.