Consistency Models Made Easy
- URL: http://arxiv.org/abs/2406.14548v2
- Date: Thu, 10 Oct 2024 21:59:49 GMT
- Title: Consistency Models Made Easy
- Authors: Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, J. Zico Kolter,
- Abstract summary: Easy Consistency Tuning (ECT) achieves vastly reduced training times while improving upon previous methods.
ECT achieves a 2-step FID of 2.73 on CIFAR10 within 1 hour on a single A100 GPU, matching Consistency Distillation trained for hundreds of GPU hours.
Our code is publicly available, making CMs more accessible to the broader community.
- Score: 49.16601441878957
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Consistency models (CMs) offer faster sampling than traditional diffusion models, but their training is resource-intensive. For example, as of 2024, training a state-of-the-art CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an effective scheme for training CMs that largely improves the efficiency of building such models. Specifically, by expressing CM trajectories via a particular differential equation, we argue that diffusion models can be viewed as a special case of CMs. We can thus fine-tune a consistency model starting from a pretrained diffusion model and progressively approximate the full consistency condition to stronger degrees over the training process. Our resulting method, which we term Easy Consistency Tuning (ECT), achieves vastly reduced training times while improving upon the quality of previous methods: for example, ECT achieves a 2-step FID of 2.73 on CIFAR10 within 1 hour on a single A100 GPU, matching Consistency Distillation trained for hundreds of GPU hours. Owing to this computational efficiency, we investigate the scaling laws of CMs under ECT, showing that they obey the classic power law scaling, hinting at their ability to improve efficiency and performance at larger scales. Our code (https://github.com/locuslab/ect) is publicly available, making CMs more accessible to the broader community.
Related papers
- Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations [62.132347451049455]
Scale has become a main ingredient in obtaining strong machine learning models.
In this work, we argue that scale and training research has been needlessly complex due to reliance on the cosine schedule.
We show that weight averaging yields improved performance along the training trajectory, without additional training costs, across different scales.
arXiv Detail & Related papers (2024-05-28T17:33:54Z) - Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better [31.67038902035949]
Diffusion Models (DM) and Consistency Models (CM) are two types of popular generative models with good generation quality on various tasks.
In this work, we find that high-quality model weights often lie in a basin which cannot be reached by SGD but can be obtained by proper checkpoint averaging.
We propose LCSC, a simple but effective and efficient method to enhance the performance of DM and CM, by combining checkpoints along the training trajectory with coefficients deduced from evolutionary search.
arXiv Detail & Related papers (2024-04-02T18:59:39Z) - Always-Sparse Training by Growing Connections with Guided Stochastic
Exploration [46.4179239171213]
We propose an efficient always-sparse training algorithm with excellent scaling to larger and sparser models.
We evaluate our method on CIFAR-10/100 and ImageNet using VGG, and ViT models, and compare it against a range of sparsification methods.
arXiv Detail & Related papers (2024-01-12T21:32:04Z) - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - Ensembling Off-the-shelf Models for GAN Training [55.34705213104182]
We find that pretrained computer vision models can significantly improve performance when used in an ensemble of discriminators.
We propose an effective selection mechanism, by probing the linear separability between real and fake samples in pretrained model embeddings.
Our method can improve GAN training in both limited data and large-scale settings.
arXiv Detail & Related papers (2021-12-16T18:59:50Z) - Accelerating Distributed K-FAC with Smart Parallelism of Computing and
Communication Tasks [13.552262050816616]
Kronecker-Factored Approximate Curvature (KFAC) is one of the most efficient approximation algorithms for training deep models.
Yet, when leveraging GPU clusters to train models with KFAC, it incurs extensive computation as well as introduces extra communications during each iteration.
We propose D-KFAC with smart parallelism of computing and communication tasks to reduce the iteration time.
arXiv Detail & Related papers (2021-07-14T08:01:07Z) - A Practical Incremental Method to Train Deep CTR Models [37.54660958085938]
We introduce a practical incremental method to train deep CTR models, which consists of three decoupled modules.
Our method can achieve comparable performance to the conventional batch mode training with much better training efficiency.
arXiv Detail & Related papers (2020-09-04T12:35:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.