Catch-Up Distillation: You Only Need to Train Once for Accelerating
Sampling
- URL: http://arxiv.org/abs/2305.10769v4
- Date: Tue, 13 Jun 2023 08:00:49 GMT
- Title: Catch-Up Distillation: You Only Need to Train Once for Accelerating
Sampling
- Authors: Shitong Shao, Xu Dai, Shouyi Yin, Lujun Li, Huanran Chen, Yang Hu
- Abstract summary: We propose the Catch-Up Distillation (CUD) to encourage the current moment output of the velocity estimation model catch up'' with its previous moment output.
Specifically, CUD adjusts the original Ordinary Differential Equation (ODE) training objective to align the current moment output with both the ground truth label and the previous moment output.
To demonstrate CUD's effectiveness, we conduct thorough ablation and comparison experiments on CIFAR-10, MNIST, and ImageNet-64.
- Score: 11.272881985569326
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Diffusion Probability Models (DPMs) have made impressive advancements in
various machine learning domains. However, achieving high-quality synthetic
samples typically involves performing a large number of sampling steps, which
impedes the possibility of real-time sample synthesis. Traditional accelerated
sampling algorithms via knowledge distillation rely on pre-trained model
weights and discrete time step scenarios, necessitating additional training
sessions to achieve their goals. To address these issues, we propose the
Catch-Up Distillation (CUD), which encourages the current moment output of the
velocity estimation model ``catch up'' with its previous moment output.
Specifically, CUD adjusts the original Ordinary Differential Equation (ODE)
training objective to align the current moment output with both the ground
truth label and the previous moment output, utilizing Runge-Kutta-based
multi-step alignment distillation for precise ODE estimation while preventing
asynchronous updates. Furthermore, we investigate the design space for CUDs
under continuous time-step scenarios and analyze how to determine the suitable
strategies. To demonstrate CUD's effectiveness, we conduct thorough ablation
and comparison experiments on CIFAR-10, MNIST, and ImageNet-64. On CIFAR-10, we
obtain a FID of 2.80 by sampling in 15 steps under one-session training and the
new state-of-the-art FID of 3.37 by sampling in one step with additional
training. This latter result necessitated only 620k iterations with a batch
size of 128, in contrast to Consistency Distillation, which demanded 2100k
iterations with a larger batch size of 256. Our code is released at
https://anonymous.4open.science/r/Catch-Up-Distillation-E31F.
Related papers
- Self-Refining Diffusion Samplers: Enabling Parallelization via Parareal Iterations [53.180374639531145]
Self-Refining Diffusion Samplers (SRDS) retain sample quality and can improve latency at the cost of additional parallel compute.
We take inspiration from the Parareal algorithm, a popular numerical method for parallel-in-time integration of differential equations.
arXiv Detail & Related papers (2024-12-11T11:08:09Z) - Directly Denoising Diffusion Models [6.109141407163027]
We present Directly Denoising Diffusion Model (DDDM), a simple and generic approach for generating realistic images with few-step sampling.
Our model achieves FID scores of 2.57 and 2.33 on CIFAR-10 in one-step and two-step sampling respectively, surpassing those obtained from GANs and distillation-based models.
For ImageNet 64x64, our approach stands as a competitive contender against leading models.
arXiv Detail & Related papers (2024-05-22T11:20:32Z) - SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation [54.31974179325654]
We propose Consistency Distillation (SCott) to enable accelerated text-to-image generation.
SCott distills the ordinary differential equation solvers-based sampling process of a pretrained teacher model into a student.
On the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID (Frechet Inceptio Distance) of 22.1, surpassing that (23.4) of the 1-step InstaFlow and matching that of 4-step UFOGen.
arXiv Detail & Related papers (2024-03-03T13:08:32Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.
We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost.
Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z) - Parallel Sampling of Diffusion Models [76.3124029406809]
Diffusion models are powerful generative models but suffer from slow sampling.
We present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel.
arXiv Detail & Related papers (2023-05-25T17:59:42Z) - Fast Sampling of Diffusion Models via Operator Learning [74.37531458470086]
We use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models.
Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method.
We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.
arXiv Detail & Related papers (2022-11-24T07:30:27Z) - ProDiff: Progressive Fast Diffusion Model For High-Quality
Text-to-Speech [63.780196620966905]
We propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech.
ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling.
Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms.
ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU.
arXiv Detail & Related papers (2022-07-13T17:45:43Z) - Progressive Distillation for Fast Sampling of Diffusion Models [17.355749359987648]
We present a method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps.
On standard image generation benchmarks like CIFAR-10, ImageNet, and LSUN, we start out with state-of-the-art samplers taking as many as 8192 steps, and are able to distill down to models taking as few as 4 steps without losing much perceptual quality.
arXiv Detail & Related papers (2022-02-01T16:07:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.