SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
- URL: http://arxiv.org/abs/2403.01505v4
- Date: Wed, 05 Mar 2025 11:39:35 GMT
- Title: SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
- Authors: Hongjian Liu, Qingsong Xie, TianXiang Ye, Zhijie Deng, Chen Chen, Shixiang Tang, Xueyang Fu, Haonan Lu, Zheng-jun Zha,
- Abstract summary: We propose Consistency Distillation (SCott) to enable accelerated text-to-image generation.<n>SCott distills the ordinary differential equation solvers-based sampling process of a pre-trained teacher model into a student.<n>On the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID of 21.9 with 2 sampling steps, surpassing that of the 1-step InstaFlow (23.4) and the 4-step UFOGen (22.1)
- Score: 74.32186107058382
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The iterative sampling procedure employed by diffusion models (DMs) often leads to significant inference latency. To address this, we propose Stochastic Consistency Distillation (SCott) to enable accelerated text-to-image generation, where high-quality and diverse generations can be achieved within just 2-4 sampling steps. In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pre-trained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher. SCott is augmented with elaborate strategies to control the noise strength and sampling process of the SDE solver. An adversarial loss is further incorporated to strengthen the consistency constraints in rare sampling steps. Empirically, on the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID of 21.9 with 2 sampling steps, surpassing that of the 1-step InstaFlow (23.4) and the 4-step UFOGen (22.1). Moreover, SCott can yield more diverse samples than other consistency models for high-resolution image generation, with up to 16% improvement in a qualified metric.
Related papers
- Learning Few-Step Diffusion Models by Trajectory Distribution Matching [18.229753357571116]
Trajectory Distribution Matching (TDM) is a unified distillation paradigm that combines the strengths of distribution and trajectory matching.
We develop a sampling-steps-aware objective that decouples learning targets across different steps, enabling more adjustable sampling.
Our model, TDM, outperforms existing methods on various backbones, delivering superior quality and significantly reduced training costs.
arXiv Detail & Related papers (2025-03-09T15:53:49Z) - Fast T2T: Optimization Consistency Speeds Up Diffusion-Based Training-to-Testing Solving for Combinatorial Optimization [83.65278205301576]
We propose to learn direct mappings from different noise levels to the optimal solution for a given instance, facilitating high-quality generation with minimal shots.
This is achieved through an optimization consistency training protocol, which minimizes the difference among samples.
Experiments on two popular tasks, the Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS), demonstrate the superiority of Fast T2T regarding both solution quality and efficiency.
arXiv Detail & Related papers (2025-02-05T07:13:43Z) - Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion [9.8078769718432]
We propose an efficient quantization framework for Stable Diffusion models.
Our approach features a Serial-to-Parallel calibration pipeline that addresses the consistency of both the calibration and inference processes.
Under W4A8 quantization settings, our approach enhances both distribution similarity and visual similarity by 45%-60%.
arXiv Detail & Related papers (2024-12-09T17:00:20Z) - See Further When Clear: Curriculum Consistency Model [20.604239652914355]
We propose the Curriculum Consistency Model ( CCM), which stabilizes and balances the learning complexity across timesteps.
Specifically, we regard the distillation process at each timestep as a curriculum and introduce a metric based on Peak Signal-to-Noise Ratio (PSNR) to quantify the learning complexity.
Our method achieves competitive single-step sampling Fr't Inception Distance (FID) scores of 1.64 on CIFAR-10 and 2.18 on ImageNet 64x64.
arXiv Detail & Related papers (2024-12-09T08:39:01Z) - EM Distillation for One-step Diffusion Models [65.57766773137068]
We propose a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of quality.
We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process.
arXiv Detail & Related papers (2024-05-27T05:55:22Z) - Directly Denoising Diffusion Models [6.109141407163027]
We present Directly Denoising Diffusion Model (DDDM), a simple and generic approach for generating realistic images with few-step sampling.
Our model achieves FID scores of 2.57 and 2.33 on CIFAR-10 in one-step and two-step sampling respectively, surpassing those obtained from GANs and distillation-based models.
For ImageNet 64x64, our approach stands as a competitive contender against leading models.
arXiv Detail & Related papers (2024-05-22T11:20:32Z) - Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis [20.2271205957037]
Hyper-SD is a novel framework that amalgamates the advantages of ODE Trajectory Preservation and Reformulation.
We introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments.
We incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process.
arXiv Detail & Related papers (2024-04-21T15:16:05Z) - Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping [75.72212215739746]
Trajectory Consistency Distillation (TCD) encompasses trajectory consistency function and strategic sampling.
TCD significantly enhances image quality at low NFEs but also yields more detailed results compared to the teacher model.
arXiv Detail & Related papers (2024-02-29T13:44:14Z) - Towards Fast Stochastic Sampling in Diffusion Generative Models [22.01769257075573]
Diffusion models suffer from slow sample generation at inference time.
We propose Splittings for fast sampling in pre-trained diffusion models in augmented spaces.
We show that a naive application of splitting is sub-optimal for fast sampling.
arXiv Detail & Related papers (2024-02-11T14:04:13Z) - Improved Techniques for Training Consistency Models [13.475711217989975]
We present improved techniques for consistency training, where consistency models learn directly from data without distillation.
We propose a lognormal noise schedule for the consistency training objective, and propose to double total discretization steps every set number of training iterations.
These modifications enable consistency models to achieve FID scores of 2.51 and 3.25 on CIFAR-10 and ImageNet $64times 64$ respectively in a single sampling step.
arXiv Detail & Related papers (2023-10-22T05:33:38Z) - Efficient Integrators for Diffusion Generative Models [22.01769257075573]
Diffusion models suffer from slow sample generation at inference time.
We propose two complementary frameworks for accelerating sample generation in pre-trained models.
We present a hybrid method that leads to the best-reported performance for diffusion models in augmented spaces.
arXiv Detail & Related papers (2023-10-11T21:04:42Z) - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z) - SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models [66.67616086310662]
Diffusion Probabilistic Models (DPMs) have achieved considerable success in generation tasks.
As sampling from DPMs is equivalent to solving diffusion SDE or ODE which is time-consuming, numerous fast sampling methods built upon improved differential equation solvers are proposed.
We propose SA-of-r, which is an improved efficient Adams method for solving diffusion SDE to generate data with high quality.
arXiv Detail & Related papers (2023-09-10T12:44:54Z) - Parallel Sampling of Diffusion Models [76.3124029406809]
Diffusion models are powerful generative models but suffer from slow sampling.
We present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel.
arXiv Detail & Related papers (2023-05-25T17:59:42Z) - Catch-Up Distillation: You Only Need to Train Once for Accelerating
Sampling [11.272881985569326]
We propose the Catch-Up Distillation (CUD) to encourage the current moment output of the velocity estimation model catch up'' with its previous moment output.
Specifically, CUD adjusts the original Ordinary Differential Equation (ODE) training objective to align the current moment output with both the ground truth label and the previous moment output.
To demonstrate CUD's effectiveness, we conduct thorough ablation and comparison experiments on CIFAR-10, MNIST, and ImageNet-64.
arXiv Detail & Related papers (2023-05-18T07:23:12Z) - ProDiff: Progressive Fast Diffusion Model For High-Quality
Text-to-Speech [63.780196620966905]
We propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech.
ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling.
Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms.
ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU.
arXiv Detail & Related papers (2022-07-13T17:45:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.