LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood
- URL: http://arxiv.org/abs/2206.14882v1
- Date: Wed, 29 Jun 2022 19:47:46 GMT
- Title: LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood
- Authors: Piotr Tempczyk, Rafa{\l} Michaluk,{\L}ukasz Garncarek, Przemys{\l}aw
Spurek, Jacek Tabor, Adam Goli\'nski
- Abstract summary: We propose a novel approach to the problem: Local Intrinsic Dimension estimation using approximate Likelihood (LIDL)
Our method relies on an arbitrary density estimation method as its subroutine and hence tries to sidestep the dimensionality challenge.
We show that LIDL yields competitive results on the standard benchmarks for this problem and that it scales to thousands of dimensions.
- Score: 10.35315334180936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most of the existing methods for estimating the local intrinsic dimension of
a data distribution do not scale well to high-dimensional data. Many of them
rely on a non-parametric nearest neighbors approach which suffers from the
curse of dimensionality. We attempt to address that challenge by proposing a
novel approach to the problem: Local Intrinsic Dimension estimation using
approximate Likelihood (LIDL). Our method relies on an arbitrary density
estimation method as its subroutine and hence tries to sidestep the
dimensionality challenge by making use of the recent progress in parametric
neural methods for likelihood estimation. We carefully investigate the
empirical properties of the proposed method, compare them with our theoretical
predictions, and show that LIDL yields competitive results on the standard
benchmarks for this problem and that it scales to thousands of dimensions. What
is more, we anticipate this approach to improve further with the continuing
advances in the density estimation literature.
Related papers
- Data value estimation on private gradients [84.966853523107]
For gradient-based machine learning (ML) methods, the de facto differential privacy technique is perturbing the gradients with random noise.
Data valuation attributes the ML performance to the training data and is widely used in privacy-aware applications that require enforcing DP.
We show that the answer is no with the default approach of injecting i.i.d.random noise to the gradients because the estimation uncertainty of the data value estimation paradoxically linearly scales with more estimation budget.
We propose to instead inject carefully correlated noise to provably remove the linear scaling of estimation uncertainty w.r.t.the budget.
arXiv Detail & Related papers (2024-12-22T13:15:51Z) - Learning Distances from Data with Normalizing Flows and Score Matching [9.605001452209867]
Density-based distances offer an elegant solution to the problem of metric learning.
We show that existing methods to estimate Fermat distances suffer from poor convergence in both low and high dimensions.
Our work paves the way for practical use of density-based distances, especially in high-dimensional spaces.
arXiv Detail & Related papers (2024-07-12T14:30:41Z) - A Wiener Process Perspective on Local Intrinsic Dimension Estimation Methods [1.6988007266875604]
Local intrinsic (LID) estimation methods have received a lot of attention in recent years thanks to the progress in deep neural networks and generative modeling.
In this paper, we investigate the recent state-of-the-art parametric LID estimation methods from the perspective of the Wiener process.
arXiv Detail & Related papers (2024-06-24T20:27:13Z) - A Finite-Horizon Approach to Active Level Set Estimation [0.7366405857677227]
We consider the problem of active learning in the context of spatial sampling for level set estimation (LSE)
We present a finite-horizon search procedure to perform LSE in one dimension while optimally balancing both the final estimation error and the distance traveled for a fixed number of samples.
We show that the resulting optimization problem can be solved in closed form and that the resulting policy generalizes existing approaches to this problem.
arXiv Detail & Related papers (2023-10-18T14:11:41Z) - Estimating Divergences in High Dimensions [6.172809837529207]
We propose the use of decomposable models for estimating divergences in high dimensional data.
These allow us to factorize the estimated density of the high-dimensional distribution into a product of lower dimensional functions.
We show empirically that estimating the Kullback-Leibler divergence using decomposable models from a maximum likelihood estimator outperforms existing methods for divergence estimation.
arXiv Detail & Related papers (2021-12-08T20:37:28Z) - Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic
Approach to Manifold Dimension Estimation [92.81218653234669]
We present new approach to manifold hypothesis checking and underlying manifold dimension estimation.
Our geometrical method is a modification for sparse data of a well-known box-counting algorithm for Minkowski dimension calculation.
Experiments on real datasets show that the suggested approach based on two methods combination is powerful and effective.
arXiv Detail & Related papers (2021-07-08T15:35:54Z) - Continuous Wasserstein-2 Barycenter Estimation without Minimax
Optimization [94.18714844247766]
Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport.
We present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures.
arXiv Detail & Related papers (2021-02-02T21:01:13Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Estimating Barycenters of Measures in High Dimensions [30.563217903502807]
We propose a scalable and general algorithm for estimating barycenters of measures in high dimensions.
We prove local convergence under mild assumptions on the discrepancy showing that the approach is well-posed.
Our approach is the first to be used to estimate barycenters in thousands of dimensions.
arXiv Detail & Related papers (2020-07-14T15:24:41Z) - Variable Skipping for Autoregressive Range Density Estimation [84.60428050170687]
We show a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.
We show that variable skipping provides 10-100$times$ efficiency improvements when targeting challenging high-quantile error metrics.
arXiv Detail & Related papers (2020-07-10T19:01:40Z) - $\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a
Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure.
Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.