Generalized Interpolating Discrete Diffusion
- URL: http://arxiv.org/abs/2503.04482v1
- Date: Thu, 06 Mar 2025 14:30:55 GMT
- Title: Generalized Interpolating Discrete Diffusion
- Authors: Dimitri von Rütte, Janis Fluri, Yuhui Ding, Antonio Orvieto, Bernhard Schölkopf, Thomas Hofmann,
- Abstract summary: Masked diffusion is a popular choice due to its simplicity and effectiveness.<n>We derive the theoretical backbone of a family of general interpolating discrete diffusion processes.<n>Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise.
- Score: 65.74168524007484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While state-of-the-art language models achieve impressive results through next-token prediction, they have inherent limitations such as the inability to revise already generated tokens. This has prompted exploration of alternative approaches such as discrete diffusion. However, masked diffusion, which has emerged as a popular choice due to its simplicity and effectiveness, reintroduces this inability to revise words. To overcome this, we generalize masked diffusion and derive the theoretical backbone of a family of general interpolating discrete diffusion (GIDD) processes offering greater flexibility in the design of the noising processes. Leveraging a novel diffusion ELBO, we achieve compute-matched state-of-the-art performance in diffusion language modeling. Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality and unlocking the ability for the model to correct its own mistakes, an area where autoregressive models notoriously have struggled. Our code and models are open-source: https://github.com/dvruette/gidd/
Related papers
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models [15.853201399662344]
Diffusion language models offer unique benefits over autoregressive models.
They lag in likelihood modeling and are limited to fixed-length generation.
We introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models.
arXiv Detail & Related papers (2025-03-12T17:43:40Z) - Continuous Diffusion Model for Language Modeling [57.396578974401734]
Existing continuous diffusion models for discrete data have limited performance compared to discrete approaches.
We propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution.
arXiv Detail & Related papers (2025-02-17T08:54:29Z) - Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - Discrete Diffusion Language Model for Efficient Text Summarization [19.267738861590487]
We introduce a novel semantic-aware noising process that enables Transformer backbones to handle long sequences effectively.
Our approaches achieve state-of-the-art performance on three benchmark summarization datasets: Gigaword, CNN/DailyMail, and Arxiv.
arXiv Detail & Related papers (2024-06-25T09:55:22Z) - Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models [100.53662473219806]
Diffusion-of-Thought (DoT) is a novel approach that integrates diffusion models with Chain-of-Thought.<n>DoT allows reasoning steps to diffuse over time through a diffusion language model.<n>Our results demonstrate the effectiveness of DoT in multi-digit multiplication, logic, and grade school math problems.
arXiv Detail & Related papers (2024-02-12T16:23:28Z) - Denoising Diffusion Variational Inference: Diffusion Models as Expressive Variational Posteriors [11.01598521921903]
DDVI is a black-box variational inference algorithm for latent variable models.
We introduce an expressive class of diffusion-based variational posteriors.
We train these posteriors with a novel regularized evidence lower bound.
arXiv Detail & Related papers (2024-01-05T10:27:44Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages.
Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data.
We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z) - Information-Theoretic Diffusion [18.356162596599436]
Denoising diffusion models have spurred significant gains in density modeling and image generation.
We introduce a new mathematical foundation for diffusion models inspired by classic results in information theory.
arXiv Detail & Related papers (2023-02-07T23:03:07Z) - StoRM: A Diffusion-based Stochastic Regeneration Model for Speech
Enhancement and Dereverberation [20.262426487434393]
We present a regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion.
We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples.
arXiv Detail & Related papers (2022-12-22T16:35:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.