Related papers: On the Role of Discreteness in Diffusion LLMs

On the Role of Discreteness in Diffusion LLMs

URL: http://arxiv.org/abs/2512.22630v1
Date: Sat, 27 Dec 2025 16:03:08 GMT
Title: On the Role of Discreteness in Diffusion LLMs
Authors: Ziqi Jin, Bin Wang, Xiang Lin, Lidong Bing, Aixin Sun,
Abstract summary: We revisit the view of diffusion process and language modeling, and outline five properties that separate diffusion mechanics from language-specific requirements.<n>We identify two central issues: (i) uniform corruption does not respect how information is distributed across positions, and (ii) token-wise marginal training cannot capture multi-token dependencies during parallel decoding.<n>These observations motivate diffusion processes that align more closely with the structure of text, and encourage future work toward more coherent diffusion language models.
Score: 69.64854287505999
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models offer appealing properties for language generation, such as parallel decoding and iterative refinement, but the discrete and highly structured nature of text challenges the direct application of diffusion principles. In this paper, we revisit diffusion language modeling from the view of diffusion process and language modeling, and outline five properties that separate diffusion mechanics from language-specific requirements. We first categorize existing approaches into continuous diffusion in embedding space and discrete diffusion over tokens. We then show that each satisfies only part of the five essential properties and therefore reflects a structural trade-off. Through analyses of recent large diffusion language models, we identify two central issues: (i) uniform corruption does not respect how information is distributed across positions, and (ii) token-wise marginal training cannot capture multi-token dependencies during parallel decoding. These observations motivate diffusion processes that align more closely with the structure of text, and encourage future work toward more coherent diffusion language models.

Related papers

Bridging the Discrete-Continuous Gap: Unified Multimodal Generation via Coupled Manifold Discrete Absorbing Diffusion [60.186310080523135]
Bifurcation of generative modeling into autoregressive approaches for discrete data (text) and diffusion approaches for continuous data (images) hinders development of truly unified multimodal systems.<n>We propose textbfCoM-DAD, a novel probabilistic framework that reformulates multimodal generation as a hierarchical dual-process.<n>Our method demonstrates superior stability over standard masked modeling, establishing a new paradigm for scalable, unified text-image generation.
arXiv Detail & Related papers (2026-01-07T16:21:19Z)
Guided Transfer Learning for Discrete Diffusion Models [21.909689920217982]
We present Guided Transfer Learning for discrete diffusion models (GTL)<n>GTL enables sampling from a target distribution without modifying the pretrained denoiser.<n>We also present an efficient guided sampler that concentrates evaluations on planner-selected positions and top candidate tokens.
arXiv Detail & Related papers (2025-12-11T18:05:55Z)
Foundations of Diffusion Models in General State Spaces: A Self-Contained Introduction [54.95522167029998]
This article is a self-contained primer on diffusion over general state spaces.<n>We develop the discrete-time view (forward noising via Markov kernels and learned reverse dynamics) alongside its continuous-time limits.<n>A common variational treatment yields the ELBO that underpins standard training losses.
arXiv Detail & Related papers (2025-12-04T18:55:36Z)
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner [66.86440230599656]
We argue that diffusion language models do not necessarily need to be in the discrete space.<n>In particular, we prove that continuous diffusion models have stronger expressivity than discrete diffusions and looped transformers.<n>We propose Coevolutionary Continuous Diffusion (CCDD), which defines a joint multimodal diffusion process on the union of a continuous representation space and a discrete token space.
arXiv Detail & Related papers (2025-10-03T17:44:41Z)
A Convergence Theory for Diffusion Language Models: An Information-Theoretic Perspective [8.15094483029656]
Diffusion models enable parallel token sampling, leading to faster generation and eliminating left-to-right generation constraints.<n>We develop convergence guarantees for diffusion language models from an information-theoretic perspective.<n>These results offer novel theoretical insights into the practical effectiveness of diffusion language models.
arXiv Detail & Related papers (2025-05-27T16:24:20Z)
Continuous Diffusion Model for Language Modeling [64.7425225935854]
Existing continuous diffusion models for discrete data underperform compared to discrete methods.<n>We propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution.<n>Our method outperforms existing discrete diffusion models and approaches the performance of autoregressive models.
arXiv Detail & Related papers (2025-02-17T08:54:29Z)
DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space [7.131920232495329]
In real-life conversations, the content is diverse, and there exists the one-to-many problem that requires diverse generation. Previous studies attempted to introduce discrete or Gaussian-based continuous latent variables to address the one-to-many problem. We propose DiffusionDialog, a novel approach to enhance the diversity of dialogue generation with the help of diffusion model.
arXiv Detail & Related papers (2024-04-10T05:56:46Z)
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models [100.53662473219806]
Diffusion-of-Thought (DoT) is a novel approach that integrates diffusion models with Chain-of-Thought.<n>DoT allows reasoning steps to diffuse over time through a diffusion language model.<n>Our results demonstrate the effectiveness of DoT in multi-digit multiplication, logic, and grade school math problems.
arXiv Detail & Related papers (2024-02-12T16:23:28Z)
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation [10.984745439751489]
We propose a novel diffusion model by applying the diffusion forward process in the textitcontinuous speech representation space. In this way, we preserve the semantic structure of the continuous speech representation space in the diffusion process and integrate the continuous and discrete diffusion models. We conduct extensive experiments on the textless direct speech-to-speech translation task, where the proposed method achieves comparable results to the computationally intensive auto-regressive baselines.
arXiv Detail & Related papers (2023-10-26T16:58:14Z)
A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.