Related papers: Distillation of Discrete Diffusion through Dimensional Correlations

Distillation of Discrete Diffusion through Dimensional Correlations

URL: http://arxiv.org/abs/2410.08709v2
Date: Thu, 30 Jan 2025 04:41:19 GMT
Title: Distillation of Discrete Diffusion through Dimensional Correlations
Authors: Satoshi Hayakawa, Yuhta Takida, Masaaki Imaizumi, Hiromi Wakaki, Yuki Mitsufuji,
Abstract summary: "Mixture" models are capable of treating dimensional correlations while remaining scalable.<n>"Loss functions" enable the mixture models to distill such many-step conventional models into just a few steps by learning the dimensional correlations.<n>Our experimental results show the effectiveness of the proposed method in distilling pretrained discrete diffusion models across image and language domains.
Score: 21.078500510691747
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Diffusion models have demonstrated exceptional performances in various fields of generative modeling, but suffer from slow sampling speed due to their iterative nature. While this issue is being addressed in continuous domains, discrete diffusion models face unique challenges, particularly in capturing dependencies between elements (e.g., pixel relationships in image, sequential dependencies in language) mainly due to the computational cost of processing high-dimensional joint distributions. In this paper, (i) we propose "mixture" models for discrete diffusion that are capable of treating dimensional correlations while remaining scalable, and (ii) we provide a set of loss functions for distilling the iterations of existing models. Two primary theoretical insights underpin our approach: First, conventional models with element-wise independence can well approximate the data distribution, but essentially require many sampling steps. Second, our loss functions enable the mixture models to distill such many-step conventional models into just a few steps by learning the dimensional correlations. Our experimental results show the effectiveness of the proposed method in distilling pretrained discrete diffusion models across image and language domains.

Related papers

Critical Iterative Denoising: A Discrete Generative Model Applied to Graphs [52.50288418639075]
We propose a novel framework called Iterative Denoising, which simplifies discrete diffusion and circumvents the issue by assuming conditional independence across time. Our empirical evaluations demonstrate that the proposed method significantly outperforms existing discrete diffusion baselines in graph generation tasks.
arXiv Detail & Related papers (2025-03-27T15:08:58Z)
Continuous Diffusion Model for Language Modeling [57.396578974401734]
Existing continuous diffusion models for discrete data have limited performance compared to discrete approaches. We propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution.
arXiv Detail & Related papers (2025-02-17T08:54:29Z)
G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving [55.185588994883226]
This paper presents a novel method for addressing linear inverse problems by leveraging image-generation models based on discrete diffusion as priors. To the best of our knowledge, this is the first approach to use discrete diffusion model-based priors for solving image inverse problems.
arXiv Detail & Related papers (2024-10-09T06:18:25Z)
Discrete Copula Diffusion [44.96934660818884]
We identify a fundamental limitation that prevents discrete diffusion models from achieving strong performance with fewer steps. We introduce a general approach to supplement the missing dependency information by incorporating another deep generative model, termed the copula model. Our method does not require fine-tuning either the diffusion model or the copula model, yet it enables high-quality sample generation with significantly fewer denoising steps.
arXiv Detail & Related papers (2024-10-02T18:51:38Z)
Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset. We develop constrained diffusion models by imposing diffusion constraints based on desired distributions. We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z)
Provable Statistical Rates for Consistency Diffusion Models [87.28777947976573]
Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved. This paper contributes towards the first statistical theory for consistency models, formulating their training as a distribution discrepancy minimization problem.
arXiv Detail & Related papers (2024-06-23T20:34:18Z)
Variational Distillation of Diffusion Policies into Mixture of Experts [26.315682445979302]
This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. VDD is the first method that distills pre-trained diffusion models into MoE models, and hence, combines the expressiveness of Diffusion Models with the benefits of Mixture Models.
arXiv Detail & Related papers (2024-06-18T12:15:05Z)
Multiple-Source Localization from a Single-Snapshot Observation Using Graph Bayesian Optimization [10.011338977476804]
Multi-source localization from a single snap-shot observation is especially relevant due to its prevalence. Current methods typically utilizes and greedy selection, and they are usually bonded with one diffusion model. We propose a simulation-based method termed BOSouL to approximate the results for its sample efficiency.
arXiv Detail & Related papers (2024-03-25T14:46:24Z)
Convergence Analysis of Discrete Diffusion Model: Exact Implementation through Uniformization [17.535229185525353]
We introduce an algorithm leveraging the uniformization of continuous Markov chains, implementing transitions on random time points. Our results align with state-of-the-art achievements for diffusion models in $mathbbRd$ and further underscore the advantages of discrete diffusion models in comparison to the $mathbbRd$ setting.
arXiv Detail & Related papers (2024-02-12T22:26:52Z)
Semi-Implicit Denoising Diffusion Models (SIDDMs) [50.30163684539586]
Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps. We introduce a novel approach that tackles the problem by matching implicit and explicit factors. We demonstrate that our proposed method obtains comparable generative performance to diffusion-based models and vastly superior results to models with a small number of sampling steps.
arXiv Detail & Related papers (2023-06-21T18:49:22Z)
Eliminating Lipschitz Singularities in Diffusion Models [51.806899946775076]
We show that diffusion models frequently exhibit the infinite Lipschitz near the zero point of timesteps. This poses a threat to the stability and accuracy of the diffusion process, which relies on integral operations. We propose a novel approach, dubbed E-TSDM, which eliminates the Lipschitz of the diffusion model near zero.
arXiv Detail & Related papers (2023-06-20T03:05:28Z)
Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data. Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z)
Infinite-Dimensional Diffusion Models [4.342241136871849]
We formulate diffusion-based generative models in infinite dimensions and apply them to the generative modeling of functions. We show that our formulations are well posed in the infinite-dimensional setting and provide dimension-independent distance bounds from the sample to the target measure. We also develop guidelines for the design of infinite-dimensional diffusion models.
arXiv Detail & Related papers (2023-02-20T18:00:38Z)
Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance [95.12230117950232]
We show that a common latent space emerges from two diffusion models trained independently on related domains. Applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors.
arXiv Detail & Related papers (2022-10-11T15:53:52Z)
Diffusion Models in Vision: A Survey [80.82832715884597]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage. Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z)
How Much is Enough? A Study on Diffusion Times in Score-based Generative Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution. We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
Modelling nonlinear dependencies in the latent space of inverse scattering [1.5990720051907859]
In inverse scattering proposed by Angles and Mallat, a deep neural network is trained to invert the scattering transform applied to an image. After such a network is trained, it can be used as a generative model given that we can sample from the distribution of principal components of scattering coefficients. Within this paper, two such models are explored, namely a Variational AutoEncoder and a Generative Adversarial Network.
arXiv Detail & Related papers (2022-03-19T12:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.