Related papers: Diffusion Models and the Manifold Hypothesis: Log-Domain Smoothing is Geometry Adaptive

Diffusion Models and the Manifold Hypothesis: Log-Domain Smoothing is Geometry Adaptive

URL: http://arxiv.org/abs/2510.02305v1
Date: Thu, 02 Oct 2025 17:59:39 GMT
Title: Diffusion Models and the Manifold Hypothesis: Log-Domain Smoothing is Geometry Adaptive
Authors: Tyler Farghly, Peter Potaptchik, Samuel Howard, George Deligiannidis, Jakiw Pidstrigach,
Abstract summary: We show that smoothing the score function produces smoothing tangential to the data manifold.<n>We also show that the manifold along which the diffusion model generalises can be controlled by choosing an appropriate smoothing.
Score: 8.59897970836622
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models have achieved state-of-the-art performance, demonstrating remarkable generalisation capabilities across diverse domains. However, the mechanisms underpinning these strong capabilities remain only partially understood. A leading conjecture, based on the manifold hypothesis, attributes this success to their ability to adapt to low-dimensional geometric structure within the data. This work provides evidence for this conjecture, focusing on how such phenomena could result from the formulation of the learning problem through score matching. We inspect the role of implicit regularisation by investigating the effect of smoothing minimisers of the empirical score matching objective. Our theoretical and empirical results confirm that smoothing the score function -- or equivalently, smoothing in the log-density domain -- produces smoothing tangential to the data manifold. In addition, we show that the manifold along which the diffusion model generalises can be controlled by choosing an appropriate smoothing.

Related papers

Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
Multivariate Bernoulli Hoeffding Decomposition: From Theory to Sensitivity Analysis [2.762021507766656]
This work focuses on the case of Bernoulli inputs and provides a complete analytical characterization of the decomposition.<n>We show that, in this discrete setting, the associated subspaces are one-dimensional and that the decomposition admits a closed-form representation.<n>The paper concludes with perspectives on extending the methodology to high-dimensional settings and to models involving inputs with finite, non-binary support.
arXiv Detail & Related papers (2025-10-08T14:46:20Z)
When and how can inexact generative models still sample from the data manifold? [2.4664553878979185]
Despite learning errors in the score function or the drift vector field, the generated samples appear to shift emphalong the support of the data distribution but not emphaway from it.<n>We show that the alignment of the top Lyapunov vectors with the tangent spaces along the boundary of the data manifold leads to robustness.
arXiv Detail & Related papers (2025-08-11T03:24:34Z)
Provable Maximum Entropy Manifold Exploration via Diffusion Models [58.89696361871563]
Exploration is critical for solving real-world decision-making problems such as scientific discovery.<n>We introduce a novel framework that casts exploration as entropy over approximate data manifold implicitly defined by a pre-trained diffusion model.<n>We develop an algorithm based on mirror descent that solves the exploration problem as sequential fine-tuning of a pre-trained diffusion model.
arXiv Detail & Related papers (2025-06-18T11:59:15Z)
Improving the Euclidean Diffusion Generation of Manifold Data by Mitigating Score Function Singularity [7.062379942776126]
We investigate direct sampling of Euclidean diffusion models for general manifold-constrained data.<n>We reveal the multiscale singularity of the score function in the embedded space of manifold, which hinders the accuracy of diffusion-generated samples.<n>We propose two novel methods to mitigate the singularity and improve the sampling accuracy.
arXiv Detail & Related papers (2025-05-15T03:12:27Z)
A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models [8.862614615192578]
We study the large-sample properties of a likelihood-based approach for estimating conditional deep generative models.<n>Our results lead to the convergence rate of a sieve maximum likelihood estimator for estimating the conditional distribution.
arXiv Detail & Related papers (2024-10-02T20:46:21Z)
Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data. We determine the types of distribution shifts that do contribute to the identifiability of causal representations. We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z)
Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties. This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z)
Towards Characterizing Domain Counterfactuals For Invertible Latent Causal Models [15.817239008727789]
In this work, we analyze a specific type of causal query called domain counterfactuals, which hypothesizes what a sample would have looked like if it had been generated in a different domain. We show that recovering the latent Structural Causal Model (SCM) is unnecessary for estimating domain counterfactuals. We also develop a theoretically grounded practical algorithm that simplifies the modeling process to generative model estimation.
arXiv Detail & Related papers (2023-06-20T04:19:06Z)
On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features. Based on these observations, we propose a conceptual framework for feature learning. Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z)
Diffusion Models are Minimax Optimal Distribution Estimators [49.47503258639454]
We provide the first rigorous analysis on approximation and generalization abilities of diffusion modeling. We show that when the true density function belongs to the Besov space and the empirical score matching loss is properly minimized, the generated data distribution achieves the nearly minimax optimal estimation rates.
arXiv Detail & Related papers (2023-03-03T11:31:55Z)
DisDiff: Unsupervised Disentanglement of Diffusion Probabilistic Models [42.58375679841317]
We propose a new task, disentanglement of Diffusion Probabilistic Models (DPMs) The task is to automatically discover the inherent factors behind the observations and disentangle the gradient fields of DPM into sub-gradient fields. We devise an unsupervised approach named DisDiff, achieving disentangled representation learning in the framework of DPMs.
arXiv Detail & Related papers (2023-01-31T15:58:32Z)
Demystifying Disagreement-on-the-Line in High Dimensions [34.103373453782744]
We develop a theoretical foundation for analyzing disagreement in high-dimensional random features regression. Experiments on CIFAR-10-C, Tiny ImageNet-C, and Camelyon17 are consistent with our theory and support the universality of the theoretical findings.
arXiv Detail & Related papers (2023-01-31T02:31:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.