Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models
- URL: http://arxiv.org/abs/2601.22629v1
- Date: Fri, 30 Jan 2026 06:39:33 GMT
- Title: Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models
- Authors: Jingxuan Wu, Zhenglin Wan, Xingrui Yu, Yuzhe Yang, Yiqiao Huang, Ivor Tsang, Yang You,
- Abstract summary: Diffusion language models (Diffusion-LMs) introduce an explicit temporal dimension into text generation.<n>We show that Diffusion-LMs, like diffusion models in image generation, exhibit a temporal division of labor.<n>We propose Time-Annealed Perturbation Sampling (TAPS), a training-free inference strategy that encourages semantic branching early in the diffusion process.<n>TAPS is compatible with both non-autoregressive and semi-autoregressive Diffusion backbones, demonstrated on LLaDA and TraDo in our paper, and consistently improves output diversity across creative writing and reasoning benchmarks without compromising generation quality.
- Score: 11.196851704643406
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion language models (Diffusion-LMs) introduce an explicit temporal dimension into text generation, yet how this structure can be leveraged to control generation diversity for exploring multiple valid semantic or reasoning paths remains underexplored. In this paper, we show that Diffusion-LMs, like diffusion models in image generation, exhibit a temporal division of labor: early denoising steps largely determine the global semantic structure, while later steps focus on local lexical refinement. Building on this insight, we propose Time-Annealed Perturbation Sampling (TAPS), a training-free inference strategy that encourages semantic branching early in the diffusion process while progressively reducing perturbations to preserve fluency and instruction adherence. TAPS is compatible with both non-autoregressive and semi-autoregressive Diffusion backbones, demonstrated on LLaDA and TraDo in our paper, and consistently improves output diversity across creative writing and reasoning benchmarks without compromising generation quality.
Related papers
- Bridging the Discrete-Continuous Gap: Unified Multimodal Generation via Coupled Manifold Discrete Absorbing Diffusion [60.186310080523135]
Bifurcation of generative modeling into autoregressive approaches for discrete data (text) and diffusion approaches for continuous data (images) hinders development of truly unified multimodal systems.<n>We propose textbfCoM-DAD, a novel probabilistic framework that reformulates multimodal generation as a hierarchical dual-process.<n>Our method demonstrates superior stability over standard masked modeling, establishing a new paradigm for scalable, unified text-image generation.
arXiv Detail & Related papers (2026-01-07T16:21:19Z) - On the Role of Discreteness in Diffusion LLMs [69.64854287505999]
We revisit the view of diffusion process and language modeling, and outline five properties that separate diffusion mechanics from language-specific requirements.<n>We identify two central issues: (i) uniform corruption does not respect how information is distributed across positions, and (ii) token-wise marginal training cannot capture multi-token dependencies during parallel decoding.<n>These observations motivate diffusion processes that align more closely with the structure of text, and encourage future work toward more coherent diffusion language models.
arXiv Detail & Related papers (2025-12-27T16:03:08Z) - Latent Discrete Diffusion Models [18.979326092796896]
We study discrete diffusion for language and other categorical data.<n>We propose emphLatent Discrete Diffusion Models (LDDM)<n>We present two instantiations: (i) FUJI-LDDMs, which perform fully joint denoising of tokens and latents, and (ii) SEQ-LDDMs, which sequentially resolve the latent and then the discrete chain conditionally on it.<n>For both variants we derive ELBO-style objectives and discuss design choices to learn informative latents yet amenable to diffusoin modeling.
arXiv Detail & Related papers (2025-10-20T21:26:52Z) - LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning [30.62691333490551]
Large Language Models (LLMs) demonstrate their reasoning ability through chain-of-thought generation.<n>We propose LaDiR, a novel reasoning framework that unifies the expressiveness of continuous latent representation.<n>LaDiR consistently improves accuracy, diversity, and interpretability over existing autoregressive, diffusion-based, and latent reasoning methods.
arXiv Detail & Related papers (2025-10-06T08:15:03Z) - Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z) - Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models [12.446047799880587]
Token-level diffusion doesn't model word-order dependencies explicitly.<n>Passage-level diffusion struggles with learning robust representations for long-form text.<n>We propose Segment-Level Diffusion, a framework that enhances diffusion-based text generation.
arXiv Detail & Related papers (2024-12-15T22:47:44Z) - Improved Paraphrase Generation via Controllable Latent Diffusion [60.479643304122504]
We propose textitLatent textitDiffusion textitParaphraser(LDP), a novel paraphrase generation by modeling a controllable diffusion process.<n>Experiments show that LDP better reconciles paraphrase generation quality and diversity than baselines.
arXiv Detail & Related papers (2024-04-13T09:24:32Z) - Text Diffusion with Reinforced Conditioning [92.17397504834825]
This paper thoroughly analyzes text diffusion models and uncovers two significant limitations: degradation of self-conditioning during training and misalignment between training and sampling.
Motivated by our findings, we propose a novel Text Diffusion model called TREC, which mitigates the degradation with Reinforced Conditioning and the misalignment by Time-Aware Variance Scaling.
arXiv Detail & Related papers (2024-02-19T09:24:02Z) - A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages.
Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data.
We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.