Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon
- URL: http://arxiv.org/abs/2404.07946v1
- Date: Thu, 14 Mar 2024 13:27:04 GMT
- Title: Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon
- Authors: Tianshuo Xu, Peng Mi, Ruilin Wang, Yingcong Chen,
- Abstract summary: Diffusion models (DMs) are a powerful generative framework that have attracted significant attention in recent years.
We propose two strategies to accelerate the training of DMs.
- Score: 16.416356358224842
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion models (DMs) are a powerful generative framework that have attracted significant attention in recent years. However, the high computational cost of training DMs limits their practical applications. In this paper, we start with a consistency phenomenon of DMs: we observe that DMs with different initializations or even different architectures can produce very similar outputs given the same noise inputs, which is rare in other generative models. We attribute this phenomenon to two factors: (1) the learning difficulty of DMs is lower when the noise-prediction diffusion model approaches the upper bound of the timestep (the input becomes pure noise), where the structural information of the output is usually generated; and (2) the loss landscape of DMs is highly smooth, which implies that the model tends to converge to similar local minima and exhibit similar behavior patterns. This finding not only reveals the stability of DMs, but also inspires us to devise two strategies to accelerate the training of DMs. First, we propose a curriculum learning based timestep schedule, which leverages the noise rate as an explicit indicator of the learning difficulty and gradually reduces the training frequency of easier timesteps, thus improving the training efficiency. Second, we propose a momentum decay strategy, which reduces the momentum coefficient during the optimization process, as the large momentum may hinder the convergence speed and cause oscillations due to the smoothness of the loss landscape. We demonstrate the effectiveness of our proposed strategies on various models and show that they can significantly reduce the training time and improve the quality of the generated images.
Related papers
- Avoiding mode collapse in diffusion models fine-tuned with reinforcement learning [0.0]
Fine-tuning foundation models via reinforcement learning (RL) has proven promising for aligning to downstream objectives.
We exploit the hierarchical nature of diffusion models (DMs) and train them dynamically at each epoch with a tailored RL method.
We show that models trained with HRF achieve better preservation of diversity in downstream tasks, thus enhancing the fine-tuning robustness and at uncompromising mean rewards.
arXiv Detail & Related papers (2024-10-10T19:06:23Z) - DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture [69.58440626023541]
Diffusion models (DMs) have demonstrated exceptional generative capabilities across various areas.
The most common way to accelerate DMs involves reducing the number of denoising steps during generation.
We propose a novel method that transfers the capability of large pretrained DMs to faster architectures.
arXiv Detail & Related papers (2024-09-05T14:12:22Z) - Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks [26.387044804861937]
Few-shot fine-tuning of Diffusion Models (DMs) is a key advancement, significantly reducing training costs and enabling personalized AI applications.
During the training process, image fidelity initially improves, then unexpectedly deteriorates with the emergence of noisy patterns, only to recover later with severe overfitting.
We term the stage with generated noisy patterns as corruption stage. Experimental results demonstrate that our method significantly mitigates corruption, and improves the fidelity, quality and diversity of the generated images in both object-driven and subject-driven generation tasks.
arXiv Detail & Related papers (2024-05-30T10:47:48Z) - Not All Steps are Equal: Efficient Generation with Progressive Diffusion
Models [62.155612146799314]
We propose a novel two-stage training strategy termed Step-Adaptive Training.
In the initial stage, a base denoising model is trained to encompass all timesteps.
We partition the timesteps into distinct groups, fine-tuning the model within each group to achieve specialized denoising capabilities.
arXiv Detail & Related papers (2023-12-20T03:32:58Z) - Unsupervised Temporal Action Localization via Self-paced Incremental
Learning [57.55765505856969]
We present a novel self-paced incremental learning model to enhance clustering and localization training simultaneously.
We design two (constant- and variable- speed) incremental instance learning strategies for easy-to-hard model training, thus ensuring the reliability of these video pseudolabels.
arXiv Detail & Related papers (2023-12-12T16:00:55Z) - Fast Diffusion Model [122.36693015093041]
Diffusion models (DMs) have been adopted across diverse fields with their abilities in capturing intricate data distributions.
In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a DM optimization perspective.
arXiv Detail & Related papers (2023-06-12T09:38:04Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Restoration based Generative Models [0.886014926770622]
Denoising diffusion models (DDMs) have attracted increasing attention by showing impressive synthesis quality.
In this paper, we establish the interpretation of DDMs in terms of image restoration (IR)
We propose a multi-scale training, which improves the performance compared to the diffusion process, by taking advantage of the flexibility of the forward process.
We believe that our framework paves the way for designing a new type of flexible general generative model.
arXiv Detail & Related papers (2023-02-20T00:53:33Z) - Post-training Quantization on Diffusion Models [14.167428759401703]
Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data.
These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise.
Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the lengthy iterative noise estimations.
arXiv Detail & Related papers (2022-11-28T19:33:39Z) - Dynamic Contrastive Distillation for Image-Text Retrieval [90.05345397400144]
We present a novel plug-in dynamic contrastive distillation (DCD) framework to compress image-text retrieval models.
We successfully apply our proposed DCD strategy to two state-of-the-art vision-language pretrained models, i.e. ViLT and METER.
Experiments on MS-COCO and Flickr30K benchmarks show the effectiveness and efficiency of our DCD framework.
arXiv Detail & Related papers (2022-07-04T14:08:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.