A Connection Between Score Matching and Local Intrinsic Dimension
- URL: http://arxiv.org/abs/2510.12975v1
- Date: Tue, 14 Oct 2025 20:35:41 GMT
- Title: A Connection Between Score Matching and Local Intrinsic Dimension
- Authors: Eric Yeats, Aaron Jacobson, Darryl Hannan, Yiran Jia, Timothy Doster, Henry Kvinge, Scott Mahan,
- Abstract summary: Local intrinsic (LID) of data is a fundamental dimension quantity in signal processing and learning theory.<n>Recent works have discovered that diffusion models capture the LID of data through the spectra of their score estimates.<n>We show that the LID is a lower bound on the denoising score matching loss, motivating use of the denoising score matching loss as a LID estimator.
- Score: 6.169320217928575
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The local intrinsic dimension (LID) of data is a fundamental quantity in signal processing and learning theory, but quantifying the LID of high-dimensional, complex data has been a historically challenging task. Recent works have discovered that diffusion models capture the LID of data through the spectra of their score estimates and through the rate of change of their density estimates under various noise perturbations. While these methods can accurately quantify LID, they require either many forward passes of the diffusion model or use of gradient computation, limiting their applicability in compute- and memory-constrained scenarios. We show that the LID is a lower bound on the denoising score matching loss, motivating use of the denoising score matching loss as a LID estimator. Moreover, we show that the equivalent implicit score matching loss also approximates LID via the normal dimension and is closely related to a recent LID estimator, FLIPD. Our experiments on a manifold benchmark and with Stable Diffusion 3.5 indicate that the denoising score matching loss is a highly competitive and scalable LID estimator, achieving superior accuracy and memory footprint under increasing problem size and quantization level.
Related papers
- Information-Theoretic Discrete Diffusion [8.018632880023336]
We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses.<n>Results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses.<n>Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators.
arXiv Detail & Related papers (2025-10-28T05:59:05Z) - Sequential Change Point Detection via Denoising Score Matching [8.22915954499148]
This paper proposes a score-based CUSUM change-point detection, in which the score functions of the data distribution are estimated by injecting noise.<n>We validate the practical efficacy of our method through numerical experiments on two synthetic datasets and a real-world earthquake precursor detection task.
arXiv Detail & Related papers (2025-01-22T06:04:57Z) - Data value estimation on private gradients [84.966853523107]
For gradient-based machine learning (ML) methods, the de facto differential privacy technique is perturbing the gradients with random noise.<n>Data valuation attributes the ML performance to the training data and is widely used in privacy-aware applications that require enforcing DP.<n>We show that the answer is no with the default approach of injecting i.i.d.random noise to the gradients because the estimation uncertainty of the data value estimation paradoxically linearly scales with more estimation budget.<n>We propose to instead inject carefully correlated noise to provably remove the linear scaling of estimation uncertainty w.r.t.the budget.
arXiv Detail & Related papers (2024-12-22T13:15:51Z) - A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with Diffusion Models [12.636148533844882]
Estimating the local dimension intrinsic (LID) of a low-dimensional submanifold is a longstanding problem.
In this work, we show that the Fokker-Planck equation associated with a diffusion model can provide an LID estimator.
Applying FLIPD to synthetic LID estimation benchmarks, we find that DMs implemented as fully-connected networks are highly effective LID estimators.
arXiv Detail & Related papers (2024-06-05T18:00:02Z) - A Data-driven Loss Weighting Scheme across Heterogeneous Tasks for Image Denoising [67.02529586335473]
In variational denoising models, weight in the data fidelity term plays the role of enhancing the noise-removal capability.<n>In this work, we propose a data-driven loss weighting scheme to address these issues.<n> Numerical results verify the remarkable performance of DLW on improving the ability of various variational denoising models to handle different complex noise.
arXiv Detail & Related papers (2022-12-09T03:28:07Z) - FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the
Underlying Score Fokker-Planck Equation [72.19198763459448]
We learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise.
These perturbed data densities are linked together by the Fokker-Planck equation (FPE), a partial differential equation (PDE) governing the spatial-temporal evolution of a density.
We derive a corresponding equation called the score FPE that characterizes the noise-conditional scores of the perturbed data densities.
arXiv Detail & Related papers (2022-10-09T16:27:25Z) - Local Intrinsic Dimensionality Signals Adversarial Perturbations [28.328973408891834]
Local dimensionality (LID) is a local metric that describes the minimum number of latent variables required to describe each data point.
In this paper, we derive a lower-bound and an upper-bound for the LID value of a perturbed data point and demonstrate that the bounds, in particular the lower-bound, has a positive correlation with the magnitude of the perturbation.
arXiv Detail & Related papers (2021-09-24T08:29:50Z) - SignalNet: A Low Resolution Sinusoid Decomposition and Estimation
Network [79.04274563889548]
We propose SignalNet, a neural network architecture that detects the number of sinusoids and estimates their parameters from quantized in-phase and quadrature samples.
We introduce a worst-case learning threshold for comparing the results of our network relative to the underlying data distributions.
In simulation, we find that our algorithm is always able to surpass the threshold for three-bit data but often cannot exceed the threshold for one-bit data.
arXiv Detail & Related papers (2021-06-10T04:21:20Z) - Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it.
Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z) - Unsupervised Domain Adaptation in the Dissimilarity Space for Person
Re-identification [11.045405206338486]
We propose a novel Dissimilarity-based Maximum Mean Discrepancy (D-MMD) loss for aligning pair-wise distances.
Empirical results with three challenging benchmark datasets show that the proposed D-MMD loss decreases as source and domain distributions become more similar.
arXiv Detail & Related papers (2020-07-27T22:10:46Z) - Localized Debiased Machine Learning: Efficient Inference on Quantile
Treatment Effects and Beyond [69.83813153444115]
We consider an efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference.
Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances.
We propose localized debiased machine learning (LDML), which avoids this burdensome step.
arXiv Detail & Related papers (2019-12-30T14:42:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.