Related papers: Information-Theoretic Discrete Diffusion

Information-Theoretic Discrete Diffusion

URL: http://arxiv.org/abs/2510.24088v1
Date: Tue, 28 Oct 2025 05:59:05 GMT
Title: Information-Theoretic Discrete Diffusion
Authors: Moongyu Jeon, Sangwoo Shin, Dongjae Jeon, Albert No,
Abstract summary: We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses.<n>Results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses.<n>Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators.
Score: 8.018632880023336
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses. Inspired by the I-MMSE identity for the Gaussian setup, we derive analogous results for the discrete setting. Specifically, we introduce the Information-Minimum Denoising Score Entropy (I-MDSE) relation, which links mutual information between data and its diffused version to the minimum denoising score entropy (DSE) loss. We extend this theory to masked diffusion and establish the Information-Minimum Denoising Cross-Entropy (I-MDCE) relation, connecting cross-entropy losses to mutual information in discrete masked processes. These results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses, showing that commonly used losses such as DSE and DCE are not merely variational bounds but tight and principled estimators of log-likelihood. The I-MDCE decomposition further enables practical extensions, including time-free formula, conditional likelihood estimation in prompt-response tasks, and coupled Monte Carlo estimation of likelihood ratios. Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators. The code is publicly available at https://github.com/Dongjae0324/infodis.

Related papers

Information Theoretic Learning for Diffusion Models with Warm Start [8.455757095201314]
We derive a tighter likelihood bound for noise driven models to improve the accuracy and efficiency of maximum likelihood learning.<n>Our key insight extends the classical KL divergence Fisher information relationship to arbitrary noise perturbations.<n>Treating the diffusion process as a Gaussian channel, we show that the proposed objective upper bounds the negative log-likelihood (NLL)
arXiv Detail & Related papers (2025-10-23T18:00:59Z)
Beyond Real Data: Synthetic Data through the Lens of Regularization [9.459299281438074]
Synthetic data can improve generalization when real data is scarce, but excessive reliance may introduce distributional mismatches that degrade performance.<n>We present a learning-theoretic framework to quantify the trade-off between synthetic and real data.
arXiv Detail & Related papers (2025-10-09T11:33:09Z)
Kernel-Smoothed Scores for Denoising Diffusion: A Bias-Variance Study [3.265950484493743]
Diffusion models can be prone to memorization.<n>Regularization on the score has the same effect as increasing the size of the training dataset.<n>This perspective highlights two regularization mechanisms taking place in denoising diffusions.
arXiv Detail & Related papers (2025-05-28T20:22:18Z)
A Deep Bayesian Nonparametric Framework for Robust Mutual Information Estimation [9.68824512279232]
Mutual Information (MI) is a crucial measure for capturing dependencies between variables.<n>We present a solution for training an MI estimator by constructing the MI loss with a finite representation of the Dirichlet process posterior to incorporate regularization.<n>We explore the application of our estimator in maximizing MI between the data space and the latent space of a variational autoencoder.
arXiv Detail & Related papers (2025-03-11T21:27:48Z)
Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information. We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z)
Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process. We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z)
On Error Propagation of Diffusion Models [77.91480554418048]
We develop a theoretical framework to mathematically formulate error propagation in the architecture of DMs. We apply the cumulative error as a regularization term to reduce error propagation. Our proposed regularization reduces error propagation, significantly improves vanilla DMs, and outperforms previous baselines.
arXiv Detail & Related papers (2023-08-09T15:31:17Z)
Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data. Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z)
FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation [72.19198763459448]
We learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These perturbed data densities are linked together by the Fokker-Planck equation (FPE), a partial differential equation (PDE) governing the spatial-temporal evolution of a density. We derive a corresponding equation called the score FPE that characterizes the noise-conditional scores of the perturbed data densities.
arXiv Detail & Related papers (2022-10-09T16:27:25Z)
A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data. A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z)
Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function [1.5559232742666467]
We show a regression approach to learning DEL enrichments of individual molecules using a custom negative log-likelihood loss function. We illustrate this approach on a dataset of 108k compounds screened against CAIX, and a dataset of 5.7M compounds screened against sEH and SIRT2.
arXiv Detail & Related papers (2021-08-27T19:37:06Z)
Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores) For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training. We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z)
Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass. We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.