Related papers: SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

URL: http://arxiv.org/abs/2412.02687v2
Date: Wed, 04 Dec 2024 08:01:47 GMT
Title: SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance
Authors: Viet Nguyen, Anh Nguyen, Trung Dao, Khoi Nguyen, Cuong Pham, Toan Tran, Anh Tran,
Abstract summary: This paper presents SNOOPI, a novel framework designed to enhance the guidance in one-step diffusion models during both training and inference.<n>By varying the guidance scale of both teacher models, we broaden their output distributions, resulting in a more robust VSD loss that enables SB to perform effectively across diverse backbones while maintaining competitive performance.<n>Second, we propose a training-free method called Negative-Away Steer Attention (NASA), which integrates negative prompts into one-step diffusion models via cross-attention to suppress undesired elements in generated images.
Score: 12.973835034100428
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent approaches have yielded promising results in distilling multi-step text-to-image diffusion models into one-step ones. The state-of-the-art efficient distillation technique, i.e., SwiftBrushv2 (SBv2), even surpasses the teacher model's performance with limited resources. However, our study reveals its instability when handling different diffusion model backbones due to using a fixed guidance scale within the Variational Score Distillation (VSD) loss. Another weakness of the existing one-step diffusion models is the missing support for negative prompt guidance, which is crucial in practical image generation. This paper presents SNOOPI, a novel framework designed to address these limitations by enhancing the guidance in one-step diffusion models during both training and inference. First, we effectively enhance training stability through Proper Guidance-SwiftBrush (PG-SB), which employs a random-scale classifier-free guidance approach. By varying the guidance scale of both teacher models, we broaden their output distributions, resulting in a more robust VSD loss that enables SB to perform effectively across diverse backbones while maintaining competitive performance. Second, we propose a training-free method called Negative-Away Steer Attention (NASA), which integrates negative prompts into one-step diffusion models via cross-attention to suppress undesired elements in generated images. Our experimental results show that our proposed methods significantly improve baseline models across various metrics. Remarkably, we achieve an HPSv2 score of 31.08, setting a new state-of-the-art benchmark for one-step diffusion models.

Related papers

Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation [2.3359837623080613]
We show that diffusion training may be viewed as a form of generative pre-training.<n>We create a one-step generation model by fine-tuning a pre-trained model with 85% of parameters frozen.
arXiv Detail & Related papers (2025-06-11T03:55:26Z)
Enhanced DACER Algorithm with High Diffusion Efficiency [30.13489573844592]
We propose DACERv2, which exploits a Q-gradient field objective to guide the denoising process at each diffusion step.<n>We show that DACERv2 achieves higher performance in most complex control environments with only five diffusion steps and shows greater multimodality.
arXiv Detail & Related papers (2025-05-29T13:21:58Z)
Few-Step Diffusion via Score identity Distillation [67.07985339442703]
Diffusion distillation has emerged as a promising strategy for accelerating text-to-image (T2I) diffusion models.<n>Existing methods rely on real or teacher-synthesized images to perform well when distilling high-resolution T2I diffusion models.<n>We propose two new guidance strategies: Zero-CFG, which disables CFG in the teacher and removes text conditioning in the fake score network, and Anti-CFG, which applies negative CFG in the fake score network.
arXiv Detail & Related papers (2025-05-19T03:45:16Z)
Fast Autoregressive Models for Continuous Latent Generation [49.079819389916764]
Autoregressive models have demonstrated remarkable success in sequential data generation, particularly in NLP.<n>Recent work, the masked autoregressive model (MAR) bypasses quantization by modeling per-token distributions in continuous spaces using a diffusion head.<n>We propose Fast AutoRegressive model (FAR), a novel framework that replaces MAR's diffusion head with a lightweight shortcut head.
arXiv Detail & Related papers (2025-04-24T13:57:08Z)
Single-Step Latent Consistency Model for Remote Sensing Image Super-Resolution [7.920423405957888]
We propose a novel single-step diffusion approach designed to enhance both efficiency and visual quality in RSISR tasks.<n>The proposed LCMSR reduces the iterative steps of traditional diffusion models from 50-1000 or more to just a single step.<n> Experimental results demonstrate that LCMSR effectively balances efficiency and performance, achieving inference times comparable to non-diffusion models.
arXiv Detail & Related papers (2025-03-25T09:56:21Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step. To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration. Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
Efficient Diffusion as Low Light Enhancer [63.789138528062225]
Reflectance-Aware Trajectory Refinement (RATR) is a simple yet effective module to refine the teacher trajectory using the reflectance component of images. textbfReflectance-aware textbfDiffusion with textbfDistilled textbfTrajectory (textbfReDDiT) is an efficient and flexible distillation framework tailored for Low-Light Image Enhancement (LLIE)
arXiv Detail & Related papers (2024-10-16T08:07:18Z)
Relational Diffusion Distillation for Efficient Image Generation [27.127061578093674]
Diffusion model's high delay hinders its wide application in edge devices with scarce computing resources. We propose Diffusion Distillation (RDD), a novel distillation method tailored specifically for distilling diffusion models. Our proposed RDD leads to 1.47 FID decrease under 1 sampling step compared to state-of-the-art diffusion distillation methods and achieving 256x speed-up.
arXiv Detail & Related papers (2024-10-10T07:40:51Z)
Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution [81.81748032199813]
We propose a Distillation-Free One-Step Diffusion model. Specifically, we propose a noise-aware discriminator (NAD) to participate in adversarial training. We improve the perceptual loss with edge-aware DISTS (EA-DISTS) to enhance the model's ability to generate fine details.
arXiv Detail & Related papers (2024-10-05T16:41:36Z)
Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs [36.65594293655289]
DoSSR is a Domain Shift diffusion-based SR model that capitalizes on the generative powers of pretrained diffusion models.<n>At the core of our approach is a domain shift equation that integrates seamlessly with existing diffusion models.<n>Our proposed method achieves state-of-the-art performance on synthetic and real-world datasets, while notably requiring only 5 sampling steps.
arXiv Detail & Related papers (2024-09-26T12:16:11Z)
Plug-and-Play Diffusion Distillation [14.359953671470242]
We propose a new distillation approach for guided diffusion models. An external lightweight guide model is trained while the original text-to-image model remains frozen. We show that our method reduces the inference of classifier-free guided latent-space diffusion models by almost half.
arXiv Detail & Related papers (2024-06-04T04:22:47Z)
Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model. For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z)
Efficient Diffusion Model for Image Restoration by Residual Shifting [63.02725947015132]
This study proposes a novel and efficient diffusion model for image restoration. Our method avoids the need for post-acceleration during inference, thereby avoiding the associated performance deterioration. Our method achieves superior or comparable performance to current state-of-the-art methods on three classical IR tasks.
arXiv Detail & Related papers (2024-03-12T05:06:07Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models [52.1809084559048]
We propose a novel two-stage divide-and-conquer training strategy termed TDC Training. It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models. While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model.
arXiv Detail & Related papers (2023-12-20T03:32:58Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model [18.25548360119976]
This paper endeavors to advance the precision of snapshot compressive imaging (SCI) reconstruction for multispectral image (MSI) We propose a novel structured zero-shot diffusion model, dubbed DiffSCI. We present extensive testing to show that DiffSCI exhibits discernible performance enhancements over prevailing self-supervised and zero-shot approaches.
arXiv Detail & Related papers (2023-11-19T20:27:14Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
Accelerating Score-based Generative Models for High-Resolution Image Synthesis [42.076244561541706]
Score-based generative models (SGMs) have recently emerged as a promising class of generative models. In this work, we consider the acceleration of high-resolution generation with SGMs. We introduce a novel Target Distribution Sampling Aware (TDAS) method by leveraging the structural priors in space and frequency domains.
arXiv Detail & Related papers (2022-06-08T17:41:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.