Related papers: The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

URL: http://arxiv.org/abs/2602.18428v1
Date: Fri, 20 Feb 2026 18:49:00 GMT
Title: The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning
Authors: Mojtaba Sahraee-Ardakan, Mauricio Delbracio, Peyman Milanfar,
Abstract summary: We study autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion.<n>We prove that generation using autonomous models is not merely blind denoising.<n>We also establish the structural stability conditions for sampling with autonomous models.
Score: 20.547812775989808
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a random variable, and how can a bounded, noise-agnostic network remain stable near the data manifold where gradients typically diverge? We resolve this paradox by formalizing Marginal Energy, $E_{\text{marg}}(\mathbf{u}) = -\log p(\mathbf{u})$, where $p(\mathbf{u}) = \int p(\mathbf{u}|t)p(t)dt$ is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a $1/t^p$ singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.

Related papers

Universality of General Spiked Tensor Models [9.454986540713655]
We study the rank-one spiked tensor model in the high-dimensional regime.<n>We show that their high-dimensional spectral behavior and statistical limits are robust to non-Gaussian noise.
arXiv Detail & Related papers (2026-02-04T11:59:30Z)
Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance [54.88271057438763]
Noise Awareness Guidance (NAG) is a correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule.<n>NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.
arXiv Detail & Related papers (2025-10-14T13:31:34Z)
DAG DECORation: Continuous Optimization for Structure Learning under Hidden Confounding [0.0]
We study structure learning for linear Gaussian SEMs in the presence of latent confounding.<n>We propose textscDECOR, a single likelihood-based estimator that jointly learns a DAG and a correlated noise model.
arXiv Detail & Related papers (2025-10-02T15:23:30Z)
The Spacetime of Diffusion Models: An Information Geometry Perspective [40.23096112113255]
We show that the standard pullback approach, utilizing the deterministic probability flow ComplementODE decoder, is fundamentally flawed.<n>We introduce a latent spacetime $z=(x_t,t)$ that indexes the family of denoising distributions $p(x_t,t)$ across all noise scales.<n>The resulting structure induces a principled Diffusion Distance Edit, where geodesics trace minimal sequences of noise and denoise edits between data.
arXiv Detail & Related papers (2025-05-23T06:16:58Z)
Non-stationary Diffusion For Probabilistic Time Series Forecasting [3.7687375904925484]
We develop a diffusion-based probabilistic forecasting framework, termed Non-stationary Diffusion (NsDiff)<n>NsDiff combines a denoising diffusion-based conditional generative model with a pre-trained conditional mean and variance estimator.<n>Experiments conducted on nine real-world and synthetic datasets demonstrate the superior performance of NsDiff compared to existing approaches.
arXiv Detail & Related papers (2025-05-07T09:29:39Z)
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
We show that wetailed high-prob convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. We demonstrate analytically and that $ta$ can be used to the preferred choice of setting for a given problem.
arXiv Detail & Related papers (2023-10-28T18:53:41Z)
Variational Nonlinear Kalman Filtering with Unknown Process Noise Covariance [24.23243651301339]
This paper presents a solution for identification of nonlinear state estimation and model parameters based on the approximate Bayesian inference principle. The performance of the proposed method is verified on radar target tracking applications by both simulated and real-world data.
arXiv Detail & Related papers (2023-05-06T03:34:39Z)
Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs) We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data. We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z)
The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators. In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z)
Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation [50.85788484752612]
Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models. It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance. In this work, we formally pinpoint reasons for NCE's poor performance when an inappropriate noise distribution is used.
arXiv Detail & Related papers (2021-10-21T16:57:45Z)
Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models. We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise. Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z)
Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries. Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics. This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.