Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
- URL: http://arxiv.org/abs/2505.17384v1
- Date: Fri, 23 May 2025 01:45:47 GMT
- Title: Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
- Authors: Tianyu Xie, Shuchen Xue, Zijin Feng, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Cheng Zhang,
- Abstract summary: Variencoding Discrete Diffusion (VADD) is a novel framework that enhances discrete diffusion with latent variable modeling.<n>By introducing an auxiliary recognition model, VADD enables stable training via variational lower bounds and amortized inference over the training set.<n> Empirical results on 2D toy data, pixel-level image generation, and text generation demonstrate that VADD consistently outperforms MDM baselines.
- Score: 48.96034602889216
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discrete diffusion models have recently shown great promise for modeling complex discrete data, with masked diffusion models (MDMs) offering a compelling trade-off between quality and generation speed. MDMs denoise by progressively unmasking multiple dimensions from an all-masked input, but their performance can degrade when using few denoising steps due to limited modeling of inter-dimensional dependencies. In this paper, we propose Variational Autoencoding Discrete Diffusion (VADD), a novel framework that enhances discrete diffusion with latent variable modeling to implicitly capture correlations among dimensions. By introducing an auxiliary recognition model, VADD enables stable training via variational lower bounds maximization and amortized inference over the training set. Our approach retains the efficiency of traditional MDMs while significantly improving sample quality, especially when the number of denoising steps is small. Empirical results on 2D toy data, pixel-level image generation, and text generation demonstrate that VADD consistently outperforms MDM baselines.
Related papers
- On Designing Diffusion Autoencoders for Efficient Generation and Representation Learning [14.707830064594056]
Diffusion autoencoders (DAs) use an input-dependent latent variable to capture representations alongside the diffusion process.<n>Better generative modelling is the primary goal of another class of diffusion models -- those that learn their forward (noising) process.
arXiv Detail & Related papers (2025-05-30T18:14:09Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Improving Fine-Grained Control via Aggregation of Multiple Diffusion Models [4.703252654452953]
This paper introduces a novel training-free algorithm in fine-grained generation, Aggregation of Multiple Diffusion Models (AMDM)<n>AMDM integrates features from multiple diffusion models into a specified model to activate specific features and enable fine-grained control.<n> Experimental results demonstrate that AMDM significantly improves fine-grained control without training, validating its effectiveness.
arXiv Detail & Related papers (2024-10-02T06:16:06Z) - Neural Diffusion Models [2.1779479916071067]
We present a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data.
NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.
arXiv Detail & Related papers (2023-10-12T13:54:55Z) - Semi-Implicit Denoising Diffusion Models (SIDDMs) [50.30163684539586]
Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps.
We introduce a novel approach that tackles the problem by matching implicit and explicit factors.
We demonstrate that our proposed method obtains comparable generative performance to diffusion-based models and vastly superior results to models with a small number of sampling steps.
arXiv Detail & Related papers (2023-06-21T18:49:22Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Restoration based Generative Models [0.886014926770622]
Denoising diffusion models (DDMs) have attracted increasing attention by showing impressive synthesis quality.
In this paper, we establish the interpretation of DDMs in terms of image restoration (IR)
We propose a multi-scale training, which improves the performance compared to the diffusion process, by taking advantage of the flexibility of the forward process.
We believe that our framework paves the way for designing a new type of flexible general generative model.
arXiv Detail & Related papers (2023-02-20T00:53:33Z) - Fast Inference in Denoising Diffusion Models via MMD Finetuning [23.779985842891705]
We present MMD-DDM, a novel method for fast sampling of diffusion models.
Our approach is based on the idea of using the Maximum Mean Discrepancy (MMD) to finetune the learned distribution with a given budget of timesteps.
Our findings show that the proposed method is able to produce high-quality samples in a fraction of the time required by widely-used diffusion models.
arXiv Detail & Related papers (2023-01-19T09:48:07Z) - f-DM: A Multi-stage Diffusion Model via Progressive Signal
Transformation [56.04628143914542]
Diffusion models (DMs) have recently emerged as SoTA tools for generative modeling in various domains.
We propose f-DM, a generalized family of DMs which allows progressive signal transformation.
We apply f-DM in image generation tasks with a range of functions, including down-sampling, blurring, and learned transformations.
arXiv Detail & Related papers (2022-10-10T18:49:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.