Balancing Understanding and Generation in Discrete Diffusion Models
- URL: http://arxiv.org/abs/2602.01362v1
- Date: Sun, 01 Feb 2026 18:00:35 GMT
- Title: Balancing Understanding and Generation in Discrete Diffusion Models
- Authors: Yue Liu, Yuzhong Zhao, Zheyong Xie, Qixiang Ye, Jianbin Jiao, Yao Hu, Shaosheng Cao, Yunfan Liu,
- Abstract summary: Masked Diffusion Language Models (MDLM) excel at semantic understanding and zero-shot generalization.<n>Uniform-noise Diffusion Language Models (UDLM) achieve strong few-step generation quality.<n>We propose XDLM, which bridges the two paradigms via a stationary noise kernel.
- Score: 58.62235340638143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In discrete generative modeling, two dominant paradigms demonstrate divergent capabilities: Masked Diffusion Language Models (MDLM) excel at semantic understanding and zero-shot generalization, whereas Uniform-noise Diffusion Language Models (UDLM) achieve strong few-step generation quality, yet neither attains balanced performance across both dimensions. To address this, we propose XDLM, which bridges the two paradigms via a stationary noise kernel. XDLM offers two key contributions: (1) it provides a principled theoretical unification of MDLM and UDLM, recovering each paradigm as a special case; and (2) an alleviated memory bottleneck enabled by an algebraic simplification of the posterior probabilities. Experiments demonstrate that XDLM advances the Pareto frontier between understanding capability and generation quality. Quantitatively, XDLM surpasses UDLM by 5.4 points on zero-shot text benchmarks and outperforms MDLM in few-step image generation (FID 54.1 vs. 80.8). When scaled to tune an 8B-parameter large language model, XDLM achieves 15.0 MBPP in just 32 steps, effectively doubling the baseline performance. Finally, analysis of training dynamics reveals XDLM's superior potential for long-term scaling. Code is available at https://github.com/MzeroMiko/XDLM
Related papers
- Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow [30.201913054064363]
Masked Diffusion Language Models promise parallel token generation and arbitrary-order decoding.<n>We characterize MDLM behavior along two dimensions -- parallelism strength and generation order.<n>We evaluate eight mainstream MDLMs on 58 benchmarks spanning knowledge, reasoning, and programming.
arXiv Detail & Related papers (2026-01-22T02:39:36Z) - Reproducing and Dissecting Denoising Language Models for Speech Recognition [31.91567892562116]
Denoising language models (DLMs) have been proposed as a powerful alternative to traditional language models (LMs) for automatic speech recognition (ASR)<n>This paper presents the first independent, large-scale empirical study of DLMs.
arXiv Detail & Related papers (2025-12-15T17:33:22Z) - Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model [98.35868970993232]
Diffusion language models (DLMs) are emerging as a powerful and promising alternative to the dominant autoregressive paradigm.<n>We introduce efficient Sampling with Adaptive acceleration and Backtracking Enhanced Remasking (i.e., Saber) to achieve better inference speed and output quality in code generation.
arXiv Detail & Related papers (2025-10-20T23:38:12Z) - Sequential Diffusion Language Models [110.06562906987052]
Diffusion language models (DLMs) have strong theoretical efficiency but are limited by fixed-length decoding and incompatibility with key-value caches.<n>We introduce Next Sequence Prediction (NSP), which unifies next-token and next-block prediction.<n>We propose Sequential Diffusion Language Model (SDLM), which can retrofit pre-trained autoregressive language models (ALMs) at minimal cost.
arXiv Detail & Related papers (2025-09-28T17:59:15Z) - DLM-One: Diffusion Language Models for One-Step Sequence Generation [63.43422118066493]
DLM-One is a score-distillation-based framework for one-step sequence generation with continuous diffusion language models.<n>We investigate whether DLM-One can achieve substantial gains in sampling efficiency for language modeling.
arXiv Detail & Related papers (2025-05-30T22:42:23Z) - Multimodal Latent Language Modeling with Next-Token Diffusion [111.93906046452125]
Multimodal generative models require a unified approach to handle both discrete data (e.g., text and code) and continuous data (e.g., image, audio, video)<n>We propose Latent Language Modeling (LatentLM), which seamlessly integrates continuous and discrete data using causal Transformers.
arXiv Detail & Related papers (2024-12-11T18:57:32Z) - Joint Prompt Optimization of Stacked LLMs using Variational Inference [66.04409787899583]
Large language models (LLMs) can be seen as atomic units of computation mapping sequences to a distribution over sequences.
By stacking two such layers and feeding the output of one layer to the next, we obtain a Deep Language Network (DLN)
We show that DLN-2 can reach higher performance than a single layer, showing promise that we might reach comparable performance to GPT-4.
arXiv Detail & Related papers (2023-06-21T18:45:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.