Related papers: eDCF: Estimating Intrinsic Dimension using Local Connectivity

eDCF: Estimating Intrinsic Dimension using Local Connectivity

URL: http://arxiv.org/abs/2510.16513v1
Date: Sat, 18 Oct 2025 14:00:39 GMT
Title: eDCF: Estimating Intrinsic Dimension using Local Connectivity
Authors: Dhruv Gupta, Aditya Nagarsekar, Vraj Shah, Sujith Thomas,
Abstract summary: This paper introduces a novel, scalable, and parallelizable method called eDCF to robustly estimate intrinsic dimension across varying scales.<n>Our method consistently matches leading estimators, achieving comparable values of mean absolute error (MAE) on synthetic benchmarks with noisy samples.<n>We also showcase our method's ability to accurately detect fractal geometries in decision boundaries, confirming its utility for analyzing realistic, structured data.
Score: 0.34998703934432673
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern datasets often contain high-dimensional features exhibiting complex dependencies. To effectively analyze such data, dimensionality reduction methods rely on estimating the dataset's intrinsic dimension (id) as a measure of its underlying complexity. However, estimating id is challenging due to its dependence on scale: at very fine scales, noise inflates id estimates, while at coarser scales, estimates stabilize to lower, scale-invariant values. This paper introduces a novel, scalable, and parallelizable method called eDCF, which is based on Connectivity Factor (CF), a local connectivity-based metric, to robustly estimate intrinsic dimension across varying scales. Our method consistently matches leading estimators, achieving comparable values of mean absolute error (MAE) on synthetic benchmarks with noisy samples. Moreover, our approach also attains higher exact intrinsic dimension match rates, reaching up to 25.0% compared to 16.7% for MLE and 12.5% for TWO-NN, particularly excelling under medium to high noise levels and large datasets. Further, we showcase our method's ability to accurately detect fractal geometries in decision boundaries, confirming its utility for analyzing realistic, structured data.

Related papers

Geometric Data Valuation via Leverage Scores [0.2538209532048866]
We propose a geometric alternative to Shapley data valuation based on statistical leverage scores.<n>We show that our scores satisfy the dummy, efficiency, and symmetry axioms of Shapley valuation.<n>We also show that training on a leverage-sampled subset produces a model whose parameters and predictive risk are within $O(varepsilon)$ of the full-data optimum.
arXiv Detail & Related papers (2025-11-03T22:20:50Z)
A Survey of Dimension Estimation Methods [0.0]
It is important to understand the real dimension of the data, hence the complexity of the dataset at hand.<n>This survey reviews a wide range of dimension estimation methods, categorising them by the geometric information they exploit.<n>The paper evaluates the performance of these methods, as well as investigating varying responses to curvature and noise.
arXiv Detail & Related papers (2025-07-18T13:05:42Z)
Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information. We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z)
Variable Importance Matching for Causal Inference [73.25504313552516]
We describe a general framework called Model-to-Match that achieves these goals. Model-to-Match uses variable importance measurements to construct a distance metric. We operationalize the Model-to-Match framework with LASSO.
arXiv Detail & Related papers (2023-02-23T00:43:03Z)
Intrinsic Dimensionality Estimation within Tight Localities: A Theoretical and Experimental Analysis [0.0]
We propose a local ID estimation strategy stable even for tight' localities consisting of as few as 20 sample points. Our experimental results show that our proposed estimation technique can achieve notably smaller variance, while maintaining comparable levels of bias, at much smaller sample sizes than state-of-the-art estimators.
arXiv Detail & Related papers (2022-09-29T00:00:11Z)
Smooth densities and generative modeling with unsupervised random forests [1.433758865948252]
An important application for density estimators is synthetic data generation. We propose a new method based on unsupervised random forests for estimating smooth densities in arbitrary dimensions without parametric constraints. We prove the consistency of our approach and demonstrate its advantages over existing tree-based density estimators.
arXiv Detail & Related papers (2022-05-19T09:50:25Z)
Semi-Supervised Quantile Estimation: Robust and Efficient Inference in High Dimensional Settings [0.5735035463793009]
We consider quantile estimation in a semi-supervised setting, characterized by two available data sets. We propose a family of semi-supervised estimators for the response quantile(s) based on the two data sets.
arXiv Detail & Related papers (2022-01-25T10:02:23Z)
Density Ratio Estimation via Infinitesimal Classification [85.08255198145304]
We propose DRE-infty, a divide-and-conquer approach to reduce Density ratio estimation (DRE) to a series of easier subproblems. Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions. We show that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
arXiv Detail & Related papers (2021-11-22T06:26:29Z)
Featurized Density Ratio Estimation [82.40706152910292]
In our work, we propose to leverage an invertible generative model to map the two distributions into a common feature space prior to estimation. This featurization brings the densities closer together in latent space, sidestepping pathological scenarios where the learned density ratios in input space can be arbitrarily inaccurate. At the same time, the invertibility of our feature map guarantees that the ratios computed in feature space are equivalent to those in input space.
arXiv Detail & Related papers (2021-07-05T18:30:26Z)
Meta-Learning for Relative Density-Ratio Estimation [59.75321498170363]
Existing methods for (relative) density-ratio estimation (DRE) require many instances from both densities. We propose a meta-learning method for relative DRE, which estimates the relative density-ratio from a few instances by using knowledge in related datasets. We empirically demonstrate the effectiveness of the proposed method by using three problems: relative DRE, dataset comparison, and outlier detection.
arXiv Detail & Related papers (2021-07-02T02:13:45Z)
Evaluating representations by the complexity of learning low-loss predictors [55.94170724668857]
We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
arXiv Detail & Related papers (2020-09-15T22:06:58Z)
$\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure. Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.