Improved Noise Schedule for Diffusion Training
- URL: http://arxiv.org/abs/2407.03297v2
- Date: Wed, 27 Nov 2024 15:10:12 GMT
- Title: Improved Noise Schedule for Diffusion Training
- Authors: Tiankai Hang, Shuyang Gu, Xin Geng, Baining Guo,
- Abstract summary: We propose a novel approach to design the noise schedule for enhancing the training of diffusion models.
We empirically demonstrate the superiority of our noise schedule over the standard cosine schedule.
- Score: 51.849746576387375
- License:
- Abstract: Diffusion models have emerged as the de facto choice for generating high-quality visual signals across various domains. However, training a single model to predict noise across various levels poses significant challenges, necessitating numerous iterations and incurring significant computational costs. Various approaches, such as loss weighting strategy design and architectural refinements, have been introduced to expedite convergence and improve model performance. In this study, we propose a novel approach to design the noise schedule for enhancing the training of diffusion models. Our key insight is that the importance sampling of the logarithm of the Signal-to-Noise ratio ($\log \text{SNR}$), theoretically equivalent to a modified noise schedule, is particularly beneficial for training efficiency when increasing the sample frequency around $\log \text{SNR}=0$. This strategic sampling allows the model to focus on the critical transition point between signal dominance and noise dominance, potentially leading to more robust and accurate predictions.We empirically demonstrate the superiority of our noise schedule over the standard cosine schedule.Furthermore, we highlight the advantages of our noise schedule design on the ImageNet benchmark, showing that the designed schedule consistently benefits different prediction targets. Our findings contribute to the ongoing efforts to optimize diffusion models, potentially paving the way for more efficient and effective training paradigms in the field of generative AI.
Related papers
- HADL Framework for Noise Resilient Long-Term Time Series Forecasting [0.7810572107832383]
Long-term time series forecasting is critical in domains such as finance, economics, and energy.
The impact of temporal noise in extended lookback windows remains underexplored, often degrading model performance and computational efficiency.
We propose a novel framework that addresses these challenges by integrating the Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT)
Our approach demonstrates competitive robustness to noisy input, significantly reduces computational complexity, and achieves competitive or state-of-the-art forecasting performance across diverse benchmark datasets.
arXiv Detail & Related papers (2025-02-14T21:41:42Z) - Unveiling the Power of Noise Priors: Enhancing Diffusion Models for Mobile Traffic Prediction [11.091373697136047]
Noise shapes mobile traffic predictions, exhibiting distinct and consistent patterns.
We propose NPDiff, a framework that decomposes noise into textitprior and textitresidual components.
NPDiff can seamlessly integrate with various diffusion-based prediction models, delivering predictions that are effective, efficient, and robust.
arXiv Detail & Related papers (2025-01-23T16:13:08Z) - Inference-Time Alignment of Diffusion Models with Direct Noise Optimization [45.77751895345154]
We propose a novel alignment approach, named Direct Noise Optimization (DNO), that optimize the injected noise during the sampling process of diffusion models.
By design, DNO operates at inference-time, and thus is tuning-free and prompt-agnostic, with the alignment occurring in an online fashion during generation.
We conduct extensive experiments on several important reward functions and demonstrate that the proposed DNO approach can achieve state-of-the-art reward scores within a reasonable time budget for generation.
arXiv Detail & Related papers (2024-05-29T08:39:39Z) - Blue noise for diffusion models [50.99852321110366]
We introduce a novel and general class of diffusion models taking correlated noise within and across images into account.
Our framework allows introducing correlation across images within a single mini-batch to improve gradient flow.
We perform both qualitative and quantitative evaluations on a variety of datasets using our method.
arXiv Detail & Related papers (2024-02-07T14:59:25Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - Perception Prioritized Training of Diffusion Models [34.674477039333475]
We show that restoring data corrupted with certain noise levels offers a proper pretext for the model to learn rich visual concepts.
We propose to prioritize such noise levels over other levels during training, by redesigning the weighting scheme of the objective function.
arXiv Detail & Related papers (2022-04-01T06:22:23Z) - A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals.
The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.