Related papers: Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

URL: http://arxiv.org/abs/2411.09998v1
Date: Fri, 15 Nov 2024 07:12:18 GMT
Title: Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training
Authors: Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim, Byung-Jun Lee,
Abstract summary: As data distributions grow more complex, training diffusion models to convergence becomes increasingly intensive. We introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps. Our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures.
Score: 4.760537994346813
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As a highly expressive generative model, diffusion models have demonstrated exceptional success across various domains, including image generation, natural language processing, and combinatorial optimization. However, as data distributions grow more complex, training these models to convergence becomes increasingly computationally intensive. While diffusion models are typically trained using uniform timestep sampling, our research shows that the variance in stochastic gradients varies significantly across timesteps, with high-variance timesteps becoming bottlenecks that hinder faster convergence. To address this issue, we introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps. Our method tracks the impact of gradient updates on the objective for each timestep, adaptively selecting those most likely to minimize the objective effectively. Experimental results demonstrate that this approach not only accelerates the training process, but also leads to improved performance at convergence. Furthermore, our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures, outperforming previously proposed timestep sampling and weighting heuristics that lack this degree of robustness.

Related papers

Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [62.640128548633946]
We introduce a novel inference-time scaling approach based on particle Gibbs sampling for discrete diffusion models.<n>Our method consistently outperforms prior inference-time strategies on reward-guided text generation tasks.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models [71.63194926457119]
We introduce Dynamical Diffusion (DyDiff), a theoretically sound framework that incorporates temporally aware forward and reverse processes. Experiments across scientifictemporal forecasting, video prediction, and time series forecasting demonstrate that Dynamical Diffusion consistently improves performance in temporal predictive tasks.
arXiv Detail & Related papers (2025-03-02T16:10:32Z)
Improved Training Technique for Latent Consistency Models [18.617862678160243]
Consistency models are capable of producing high-quality samples in either a single step or multiple steps. We analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers. We introduce a diffusion loss at early timesteps and employ optimal transport (OT) coupling to further enhance performance.
arXiv Detail & Related papers (2025-02-03T15:25:58Z)
Provable Statistical Rates for Consistency Diffusion Models [87.28777947976573]
Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved. This paper contributes towards the first statistical theory for consistency models, formulating their training as a distribution discrepancy minimization problem.
arXiv Detail & Related papers (2024-06-23T20:34:18Z)
TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models [40.5153344875351]
We introduce TMPQ-DM, which jointly optimize timestep reduction and quantization to achieve a superior performance-efficiency trade-off. For timestep reduction, we devise a non-uniform grouping scheme tailored to the non-uniform nature of the denoising process. In terms of quantization, we adopt a fine-grained layer-wise approach to allocate varying bit-widths to different layers based on their respective contributions to the final generative performance.
arXiv Detail & Related papers (2024-04-15T07:51:40Z)
MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process [26.661721555671626]
We introduce a novel Multi-Granularity Time Series (MG-TSD) model, which achieves state-of-the-art predictive performance. Our approach does not rely on additional external data, making it versatile and applicable across various domains.
arXiv Detail & Related papers (2024-03-09T01:15:03Z)
Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts. We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep. We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z)
Fast Sampling via Discrete Non-Markov Diffusion Models [49.598085130313514]
We propose a discrete non-Markov diffusion model, which admits an accelerated reverse sampling for discrete data generation. Our method significantly reduces the number of function evaluations (i.e., calls to the neural network), making the sampling process much faster.
arXiv Detail & Related papers (2023-12-14T18:14:11Z)
Conditional Variational Diffusion Models [1.8657053208839998]
Inverse problems aim to determine parameters from observations, a crucial task in engineering and science. We propose a novel approach for learning the variance schedule as part of the training process. Our method supports probabilistic conditioning on data, provides high-quality solutions, and is flexible, proving able to adapt to different applications with minimum overhead.
arXiv Detail & Related papers (2023-12-04T14:45:56Z)
Score Regularized Policy Optimization through Diffusion Behavior [25.926641622408752]
Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling. We propose to extract an efficient deterministic inference policy from critic models and pretrained diffusion behavior models. Our method boosts action sampling speed by more than 25 times compared with various leading diffusion-based methods in locomotion tasks.
arXiv Detail & Related papers (2023-10-11T08:31:26Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning [52.72369034247396]
We propose the diffusion glancing transformer, which employs a modality diffusion process and residual glancing sampling. DIFFGLAT achieves better generation accuracy while maintaining fast decoding speed compared with both autoregressive and non-autoregressive models.
arXiv Detail & Related papers (2022-12-20T13:36:25Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.