Related papers: On Convolutions, Intrinsic Dimension, and Diffusion Models

On Convolutions, Intrinsic Dimension, and Diffusion Models

URL: http://arxiv.org/abs/2506.20705v1
Date: Wed, 25 Jun 2025 18:00:00 GMT
Title: On Convolutions, Intrinsic Dimension, and Diffusion Models
Authors: Kin Kwan Leung, Rasa Hosseinzadeh, Gabriel Loaiza-Ganem,
Abstract summary: manifold hypothesis asserts that data of interest in high-dimensional ambient spaces, such as image data, lies on unknown low-dimensional submanifolds.<n>DMs are known to be able to learn distributions with low-dimensional support.
Score: 9.220922665765153
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: The manifold hypothesis asserts that data of interest in high-dimensional ambient spaces, such as image data, lies on unknown low-dimensional submanifolds. Diffusion models (DMs) -- which operate by convolving data with progressively larger amounts of Gaussian noise and then learning to revert this process -- have risen to prominence as the most performant generative models, and are known to be able to learn distributions with low-dimensional support. For a given datum in one of these submanifolds, we should thus intuitively expect DMs to have implicitly learned its corresponding local intrinsic dimension (LID), i.e. the dimension of the submanifold it belongs to. Kamkari et al. (2024b) recently showed that this is indeed the case by linking this LID to the rate of change of the log marginal densities of the DM with respect to the amount of added noise, resulting in an LID estimator known as FLIPD. LID estimators such as FLIPD have a plethora of uses, among others they quantify the complexity of a given datum, and can be used to detect outliers, adversarial examples and AI-generated text. FLIPD achieves state-of-the-art performance at LID estimation, yet its theoretical underpinnings are incomplete since Kamkari et al. (2024b) only proved its correctness under the highly unrealistic assumption of affine submanifolds. In this work we bridge this gap by formally proving the correctness of FLIPD under realistic assumptions. Additionally, we show that an analogous result holds when Gaussian convolutions are replaced with uniform ones, and discuss the relevance of this result.

Related papers

Dimension-Free Convergence of Diffusion Models for Approximate Gaussian Mixtures [18.828955620788566]
Diffusion models are distinguished by their exceptional generative performance.<n>This paper investigates the effectiveness of diffusion models in sampling from complex high-dimensional distributions.
arXiv Detail & Related papers (2025-04-07T17:59:07Z)
Low-dimensional adaptation of diffusion models: Convergence in total variation [13.218641525691195]
We investigate how diffusion generative models leverage (unknown) low-dimensional structure to accelerate sampling.<n>Our findings provide the first rigorous evidence for the adaptivity of the DDIM-type samplers to unknown low-dimensional structure.
arXiv Detail & Related papers (2025-01-22T16:12:33Z)
Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality [21.10158431913811]
We investigate how the DDPM can achieve sampling speed-ups through automatic exploitation of intrinsic low dimensionality of data. We prove that the iteration complexity of the DDPM scales nearly linearly with $k$, which is optimal when using KL divergence to measure distributional discrepancy.
arXiv Detail & Related papers (2024-10-24T14:36:12Z)
Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions [6.9408143976091745]
Denoising Diffusion Probabilistic Models (DDPM) are powerful state-of-the-art methods used to generate synthetic data from high-dimensional data distributions.<n>We study DDPMs under the manifold hypothesis and prove that they achieve rates independent of the ambient dimension in terms of score learning.<n>In terms of sampling complexity, we obtain rates independent of the ambient dimension w.r.t. the Kullback-Leibler divergence, and $O(sqrtD)$ w.r.t. the Wasserstein distance.
arXiv Detail & Related papers (2024-09-27T14:57:18Z)
A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with Diffusion Models [12.636148533844882]
Estimating the local dimension intrinsic (LID) of a low-dimensional submanifold is a longstanding problem. In this work, we show that the Fokker-Planck equation associated with a diffusion model can provide an LID estimator. Applying FLIPD to synthetic LID estimation benchmarks, we find that DMs implemented as fully-connected networks are highly effective LID estimators.
arXiv Detail & Related papers (2024-06-05T18:00:02Z)
On Error Propagation of Diffusion Models [77.91480554418048]
We develop a theoretical framework to mathematically formulate error propagation in the architecture of DMs. We apply the cumulative error as a regularization term to reduce error propagation. Our proposed regularization reduces error propagation, significantly improves vanilla DMs, and outperforms previous baselines.
arXiv Detail & Related papers (2023-08-09T15:31:17Z)
Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance. We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
Diffusion Models are Minimax Optimal Distribution Estimators [49.47503258639454]
We provide the first rigorous analysis on approximation and generalization abilities of diffusion modeling. We show that when the true density function belongs to the Besov space and the empirical score matching loss is properly minimized, the generated data distribution achieves the nearly minimax optimal estimation rates.
arXiv Detail & Related papers (2023-03-03T11:31:55Z)
Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace. We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated. The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z)
Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications. We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z)
Diagnosing and Fixing Manifold Overfitting in Deep Generative Models [11.82509693248749]
Likelihood-based, or explicit, deep generative models use neural networks to construct flexible high-dimensional densities. We show that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation.
arXiv Detail & Related papers (2022-04-14T18:00:03Z)
Flexible Amortized Variational Inference in qBOLD MRI [56.4324135502282]
Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data. Existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV. This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV.
arXiv Detail & Related papers (2022-03-11T10:47:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.