Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training
- URL: http://arxiv.org/abs/2411.09998v1
- Date: Fri, 15 Nov 2024 07:12:18 GMT
- Title: Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training
- Authors: Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim, Byung-Jun Lee,
- Abstract summary: As data distributions grow more complex, training diffusion models to convergence becomes increasingly intensive.
We introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps.
Our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures.
- Score: 4.760537994346813
- License:
- Abstract: As a highly expressive generative model, diffusion models have demonstrated exceptional success across various domains, including image generation, natural language processing, and combinatorial optimization. However, as data distributions grow more complex, training these models to convergence becomes increasingly computationally intensive. While diffusion models are typically trained using uniform timestep sampling, our research shows that the variance in stochastic gradients varies significantly across timesteps, with high-variance timesteps becoming bottlenecks that hinder faster convergence. To address this issue, we introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps. Our method tracks the impact of gradient updates on the objective for each timestep, adaptively selecting those most likely to minimize the objective effectively. Experimental results demonstrate that this approach not only accelerates the training process, but also leads to improved performance at convergence. Furthermore, our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures, outperforming previously proposed timestep sampling and weighting heuristics that lack this degree of robustness.
Related papers
- Provable Statistical Rates for Consistency Diffusion Models [87.28777947976573]
Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved.
This paper contributes towards the first statistical theory for consistency models, formulating their training as a distribution discrepancy minimization problem.
arXiv Detail & Related papers (2024-06-23T20:34:18Z) - TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models [40.5153344875351]
We introduce TMPQ-DM, which jointly optimize timestep reduction and quantization to achieve a superior performance-efficiency trade-off.
For timestep reduction, we devise a non-uniform grouping scheme tailored to the non-uniform nature of the denoising process.
In terms of quantization, we adopt a fine-grained layer-wise approach to allocate varying bit-widths to different layers based on their respective contributions to the final generative performance.
arXiv Detail & Related papers (2024-04-15T07:51:40Z) - MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process [26.661721555671626]
We introduce a novel Multi-Granularity Time Series (MG-TSD) model, which achieves state-of-the-art predictive performance.
Our approach does not rely on additional external data, making it versatile and applicable across various domains.
arXiv Detail & Related papers (2024-03-09T01:15:03Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Fast Sampling via Discrete Non-Markov Diffusion Models [49.598085130313514]
We propose a discrete non-Markov diffusion model, which admits an accelerated reverse sampling for discrete data generation.
Our method significantly reduces the number of function evaluations (i.e., calls to the neural network), making the sampling process much faster.
arXiv Detail & Related papers (2023-12-14T18:14:11Z) - Conditional Variational Diffusion Models [1.8657053208839998]
Inverse problems aim to determine parameters from observations, a crucial task in engineering and science.
We propose a novel approach for learning the variance schedule as part of the training process.
Our method supports probabilistic conditioning on data, provides high-quality solutions, and is flexible, proving able to adapt to different applications with minimum overhead.
arXiv Detail & Related papers (2023-12-04T14:45:56Z) - Score Regularized Policy Optimization through Diffusion Behavior [25.926641622408752]
Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling.
We propose to extract an efficient deterministic inference policy from critic models and pretrained diffusion behavior models.
Our method boosts action sampling speed by more than 25 times compared with various leading diffusion-based methods in locomotion tasks.
arXiv Detail & Related papers (2023-10-11T08:31:26Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Diffusion Glancing Transformer for Parallel Sequence to Sequence
Learning [52.72369034247396]
We propose the diffusion glancing transformer, which employs a modality diffusion process and residual glancing sampling.
DIFFGLAT achieves better generation accuracy while maintaining fast decoding speed compared with both autoregressive and non-autoregressive models.
arXiv Detail & Related papers (2022-12-20T13:36:25Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.