Related papers: LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood

LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood

URL: http://arxiv.org/abs/2206.14882v1
Date: Wed, 29 Jun 2022 19:47:46 GMT
Title: LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood
Authors: Piotr Tempczyk, Rafa{\l} Michaluk,{\L}ukasz Garncarek, Przemys{\l}aw Spurek, Jacek Tabor, Adam Goli\'nski
Abstract summary: We propose a novel approach to the problem: Local Intrinsic Dimension estimation using approximate Likelihood (LIDL) Our method relies on an arbitrary density estimation method as its subroutine and hence tries to sidestep the dimensionality challenge. We show that LIDL yields competitive results on the standard benchmarks for this problem and that it scales to thousands of dimensions.
Score: 10.35315334180936
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most of the existing methods for estimating the local intrinsic dimension of a data distribution do not scale well to high-dimensional data. Many of them rely on a non-parametric nearest neighbors approach which suffers from the curse of dimensionality. We attempt to address that challenge by proposing a novel approach to the problem: Local Intrinsic Dimension estimation using approximate Likelihood (LIDL). Our method relies on an arbitrary density estimation method as its subroutine and hence tries to sidestep the dimensionality challenge by making use of the recent progress in parametric neural methods for likelihood estimation. We carefully investigate the empirical properties of the proposed method, compare them with our theoretical predictions, and show that LIDL yields competitive results on the standard benchmarks for this problem and that it scales to thousands of dimensions. What is more, we anticipate this approach to improve further with the continuing advances in the density estimation literature.

Related papers

Data value estimation on private gradients [84.966853523107]
For gradient-based machine learning (ML) methods, the de facto differential privacy technique is perturbing the gradients with random noise. Data valuation attributes the ML performance to the training data and is widely used in privacy-aware applications that require enforcing DP. We show that the answer is no with the default approach of injecting i.i.d.random noise to the gradients because the estimation uncertainty of the data value estimation paradoxically linearly scales with more estimation budget. We propose to instead inject carefully correlated noise to provably remove the linear scaling of estimation uncertainty w.r.t.the budget.
arXiv Detail & Related papers (2024-12-22T13:15:51Z)
Learning Distances from Data with Normalizing Flows and Score Matching [9.605001452209867]
Density-based distances offer an elegant solution to the problem of metric learning. We show that existing methods to estimate Fermat distances suffer from poor convergence in both low and high dimensions. Our work paves the way for practical use of density-based distances, especially in high-dimensional spaces.
arXiv Detail & Related papers (2024-07-12T14:30:41Z)
Efficient Nearest Neighbor based Uncertainty Estimation for Natural Language Processing Tasks [26.336947440529713]
$k$-Nearest Neighbor Uncertainty Estimation ($k$NN-UE) is an uncertainty estimation method that uses the distances from the neighbors and label-existence ratio of neighbors. Our experiments show that our proposed method outperforms the baselines or recent density-based methods in confidence calibration, selective prediction, and out-of-distribution detection.
arXiv Detail & Related papers (2024-07-02T10:33:31Z)
A Wiener process perspective on local intrinsic dimension estimation methods [1.6988007266875604]
Local intrinsic (LID) estimation methods have received a lot of attention in recent years thanks to the progress in deep neural networks and generative modeling. In this paper, we investigate the recent state-of-the-art parametric LID estimation methods from the perspective of the Wiener process.
arXiv Detail & Related papers (2024-06-24T20:27:13Z)
A Finite-Horizon Approach to Active Level Set Estimation [0.7366405857677227]
We consider the problem of active learning in the context of spatial sampling for level set estimation (LSE) We present a finite-horizon search procedure to perform LSE in one dimension while optimally balancing both the final estimation error and the distance traveled for a fixed number of samples. We show that the resulting optimization problem can be solved in closed form and that the resulting policy generalizes existing approaches to this problem.
arXiv Detail & Related papers (2023-10-18T14:11:41Z)
Estimating Divergences in High Dimensions [6.172809837529207]
We propose the use of decomposable models for estimating divergences in high dimensional data. These allow us to factorize the estimated density of the high-dimensional distribution into a product of lower dimensional functions. We show empirically that estimating the Kullback-Leibler divergence using decomposable models from a maximum likelihood estimator outperforms existing methods for divergence estimation.
arXiv Detail & Related papers (2021-12-08T20:37:28Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic Approach to Manifold Dimension Estimation [92.81218653234669]
We present new approach to manifold hypothesis checking and underlying manifold dimension estimation. Our geometrical method is a modification for sparse data of a well-known box-counting algorithm for Minkowski dimension calculation. Experiments on real datasets show that the suggested approach based on two methods combination is powerful and effective.
arXiv Detail & Related papers (2021-07-08T15:35:54Z)
Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization [94.18714844247766]
Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport. We present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures.
arXiv Detail & Related papers (2021-02-02T21:01:13Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Estimating Barycenters of Measures in High Dimensions [30.563217903502807]
We propose a scalable and general algorithm for estimating barycenters of measures in high dimensions. We prove local convergence under mild assumptions on the discrepancy showing that the approach is well-posed. Our approach is the first to be used to estimate barycenters in thousands of dimensions.
arXiv Detail & Related papers (2020-07-14T15:24:41Z)
Variable Skipping for Autoregressive Range Density Estimation [84.60428050170687]
We show a technique, variable skipping, for accelerating range density estimation over deep autoregressive models. We show that variable skipping provides 10-100$times$ efficiency improvements when targeting challenging high-quantile error metrics.
arXiv Detail & Related papers (2020-07-10T19:01:40Z)
$\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure. Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.