Improved Techniques for Training Consistency Models
- URL: http://arxiv.org/abs/2310.14189v1
- Date: Sun, 22 Oct 2023 05:33:38 GMT
- Title: Improved Techniques for Training Consistency Models
- Authors: Yang Song and Prafulla Dhariwal
- Abstract summary: We present improved techniques for consistency training, where consistency models learn directly from data without distillation.
We propose a lognormal noise schedule for the consistency training objective, and propose to double total discretization steps every set number of training iterations.
These modifications enable consistency models to achieve FID scores of 2.51 and 3.25 on CIFAR-10 and ImageNet $64times 64$ respectively in a single sampling step.
- Score: 13.475711217989975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consistency models are a nascent family of generative models that can sample
high quality data in one step without the need for adversarial training.
Current consistency models achieve optimal sample quality by distilling from
pre-trained diffusion models and employing learned metrics such as LPIPS.
However, distillation limits the quality of consistency models to that of the
pre-trained diffusion model, and LPIPS causes undesirable bias in evaluation.
To tackle these challenges, we present improved techniques for consistency
training, where consistency models learn directly from data without
distillation. We delve into the theory behind consistency training and identify
a previously overlooked flaw, which we address by eliminating Exponential
Moving Average from the teacher consistency model. To replace learned metrics
like LPIPS, we adopt Pseudo-Huber losses from robust statistics. Additionally,
we introduce a lognormal noise schedule for the consistency training objective,
and propose to double total discretization steps every set number of training
iterations. Combined with better hyperparameter tuning, these modifications
enable consistency models to achieve FID scores of 2.51 and 3.25 on CIFAR-10
and ImageNet $64\times 64$ respectively in a single sampling step. These scores
mark a 3.5$\times$ and 4$\times$ improvement compared to prior consistency
training approaches. Through two-step sampling, we further reduce FID scores to
2.24 and 2.77 on these two datasets, surpassing those obtained via distillation
in both one-step and two-step settings, while narrowing the gap between
consistency models and other state-of-the-art generative models.
Related papers
- Stable Consistency Tuning: Understanding and Improving Consistency Models [40.2712218203989]
Diffusion models achieve superior generation quality but suffer from slow generation speed due to iterative nature of denoising.
consistency models, a new generative family, achieve competitive performance with significantly faster sampling.
We propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference(TD) Learning.
arXiv Detail & Related papers (2024-10-24T17:55:52Z) - Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density [70.14884528360199]
We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with enhanced fidelity or increased diversity.
Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density.
arXiv Detail & Related papers (2024-07-11T16:46:04Z) - Provable Statistical Rates for Consistency Diffusion Models [87.28777947976573]
Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved.
This paper contributes towards the first statistical theory for consistency models, formulating their training as a distribution discrepancy minimization problem.
arXiv Detail & Related papers (2024-06-23T20:34:18Z) - Directly Denoising Diffusion Models [6.109141407163027]
We present Directly Denoising Diffusion Model (DDDM), a simple and generic approach for generating realistic images with few-step sampling.
Our model achieves FID scores of 2.57 and 2.33 on CIFAR-10 in one-step and two-step sampling respectively, surpassing those obtained from GANs and distillation-based models.
For ImageNet 64x64, our approach stands as a competitive contender against leading models.
arXiv Detail & Related papers (2024-05-22T11:20:32Z) - Not All Steps are Equal: Efficient Generation with Progressive Diffusion
Models [62.155612146799314]
We propose a novel two-stage training strategy termed Step-Adaptive Training.
In the initial stage, a base denoising model is trained to encompass all timesteps.
We partition the timesteps into distinct groups, fine-tuning the model within each group to achieve specialized denoising capabilities.
arXiv Detail & Related papers (2023-12-20T03:32:58Z) - One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image.
Our method enables fully offline training with just noise/image pairs from the diffusion model.
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z) - Self-Distillation Mixup Training for Non-autoregressive Neural Machine
Translation [13.527174969073073]
Non-autoregressive (NAT) models predict outputs in parallel, achieving substantial improvements in generation speed compared to autoregressive (AT) models.
While performing worse on raw data, most NAT models are trained as student models on distilled data generated by AT teacher models.
An effective training strategy is Self-Distillation Mixup (SDM) Training, which pre-trains a model on raw data, generates distilled data by the pre-trained model itself and finally re-trains a model on the combination of raw data and distilled data.
arXiv Detail & Related papers (2021-12-22T03:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.