Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
- URL: http://arxiv.org/abs/2510.19304v1
- Date: Wed, 22 Oct 2025 07:08:47 GMT
- Title: Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
- Authors: Mingyu Jo, Jaesik Yoon, Justin Deschenaux, Caglar Gulcehre, Sungjin Ahn,
- Abstract summary: We introduce Loopholing, a novel and simple mechanism that preserves information via a deterministic latent pathway.<n>LDDMs achieve substantial gains-reducing generative perplexity by up to 61% over prior baselines.<n>Results also indicate that loopholing mitigates idle steps and oscillations, providing a scalable path toward high-quality non-autoregressive text generation.
- Score: 28.243098541421755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discrete diffusion models offer a promising alternative to autoregressive generation through parallel decoding, but they suffer from a sampling wall: once categorical sampling occurs, rich distributional information collapses into one-hot vectors and cannot be propagated across steps, forcing subsequent steps to operate with limited information. To mitigate this problem, we introduce Loopholing, a novel and simple mechanism that preserves this information via a deterministic latent pathway, leading to Loopholing Discrete Diffusion Models (LDDMs). Trained efficiently with a self-conditioning strategy, LDDMs achieve substantial gains-reducing generative perplexity by up to 61% over prior baselines, closing (and in some cases surpassing) the gap with autoregressive models, and producing more coherent text. Applied to reasoning tasks, LDDMs also improve performance on arithmetic benchmarks such as Countdown and Game of 24. These results also indicate that loopholing mitigates idle steps and oscillations, providing a scalable path toward high-quality non-autoregressive text generation.
Related papers
- Causal Autoregressive Diffusion Language Model [70.7353007255797]
CARD reformulates the diffusion process within a strictly causal attention mask, enabling dense, per-token supervision in a single forward pass.<n>Our results demonstrate that CARD achieves ARM-level data efficiency while unlocking the latency benefits of parallel generation.
arXiv Detail & Related papers (2026-01-29T17:38:29Z) - MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization [66.82303841930752]
diffusion language models (DLMs) have presented a promising alternative to traditional autoregressive large language models (LLMs)<n>DLMs still lag behind LLMs in reasoning performance, especially as the number of denoising steps decreases.<n>We propose a Multi-Reward Optimization (MRO) approach, which encourages DLMs to consider the token correlation during the denoising process.
arXiv Detail & Related papers (2025-10-24T13:57:59Z) - Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z) - Reverse Markov Learning: Multi-Step Generative Models for Complex Distributions [10.165179181394755]
We propose a framework that defines a general forward process transitioning from the target distribution to a known distribution.<n>We then learn a reverse Markov process using multiple engression models.<n>This framework accommodates general forward processes, allows for dimension reduction, and naturally discretizes the generative process.
arXiv Detail & Related papers (2025-02-19T14:10:15Z) - Distributional Diffusion Models with Scoring Rules [83.38210785728994]
Diffusion models generate high-quality synthetic data.<n> generating high-quality outputs requires many discretization steps.<n>We propose to accomplish sample generation by learning the posterior em distribution of clean data samples.
arXiv Detail & Related papers (2025-02-04T16:59:03Z) - Path-Consistency with Prefix Enhancement for Efficient Inference in LLMs [3.6696973040141034]
path-consistency improves inference latency by up to 40.5%, while maintaining task accuracy across various tasks.<n>Our empirical results demonstrate that path-consistency improves inference latency by up to 40.5%, while maintaining task accuracy across various tasks.
arXiv Detail & Related papers (2024-08-25T01:45:53Z) - LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights [2.8461446020965435]
We introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing Latent Diffusion Models.
We demonstrate the effectiveness of our approach on three different tasks: text-to-image (T2I) generation, Unconditional Image Generation (UIG) and Unconditional Audio Generation (UAG)
arXiv Detail & Related papers (2024-04-18T06:35:37Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Semi-Implicit Denoising Diffusion Models (SIDDMs) [50.30163684539586]
Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps.
We introduce a novel approach that tackles the problem by matching implicit and explicit factors.
We demonstrate that our proposed method obtains comparable generative performance to diffusion-based models and vastly superior results to models with a small number of sampling steps.
arXiv Detail & Related papers (2023-06-21T18:49:22Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Latent Autoregressive Source Separation [5.871054749661012]
This paper introduces vector-quantized Latent Autoregressive Source Separation (i.e., de-mixing an input signal into its constituent sources) without requiring additional gradient-based optimization or modifications of existing models.
Our separation method relies on the Bayesian formulation in which the autoregressive models are the priors, and a discrete (non-parametric) likelihood function is constructed by performing frequency counts over latent sums of addend tokens.
arXiv Detail & Related papers (2023-01-09T17:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.