Manifold Dimension Estimation: An Empirical Study
- URL: http://arxiv.org/abs/2509.15517v1
- Date: Fri, 19 Sep 2025 01:48:58 GMT
- Title: Manifold Dimension Estimation: An Empirical Study
- Authors: Zelong Bi, Pierre Lafaye de Micheaux,
- Abstract summary: The manifold hypothesis suggests that high-dimensional data often lie on or near a low-dimensional manifold.<n>Estimating the dimension of this manifold is essential for leveraging its structure.<n>This article provides a comprehensive survey for both researchers and practitioners.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The manifold hypothesis suggests that high-dimensional data often lie on or near a low-dimensional manifold. Estimating the dimension of this manifold is essential for leveraging its structure, yet existing work on dimension estimation is fragmented and lacks systematic evaluation. This article provides a comprehensive survey for both researchers and practitioners. We review often-overlooked theoretical foundations and present eight representative estimators. Through controlled experiments, we analyze how individual factors such as noise, curvature, and sample size affect performance. We also compare the estimators on diverse synthetic and real-world datasets, introducing a principled approach to dataset-specific hyperparameter tuning. Our results offer practical guidance and suggest that, for a problem of this generality, simpler methods often perform better.
Related papers
- Efficient Covariance Estimation for Sparsified Functional Data [51.69796254617083]
proposed Random-knots (Random-knots-Spatial) and B-spline (Bspline-Spatial) estimators of the covariance function are computationally efficient.<n>Asymptotic pointwise of the covariance are obtained for sparsified individual trajectories under some regularity conditions.
arXiv Detail & Related papers (2025-11-23T00:50:33Z) - Towards Size-invariant Salient Object Detection: A Generic Evaluation and Optimization Approach [118.75896764188424]
We present a novel perspective to expose the inherent size sensitivity of existing widely used Salient Object Detection metrics.<n>To address this challenge, a generic Size-Invariant Evaluation (SIEva) framework is proposed.<n>We further develop a dedicated optimization framework (SIOpt), which adheres to the size-invariant principle and significantly enhances the detection of salient objects across a broad range of sizes.
arXiv Detail & Related papers (2025-09-19T04:12:14Z) - A Survey of Dimension Estimation Methods [0.0]
It is important to understand the real dimension of the data, hence the complexity of the dataset at hand.<n>This survey reviews a wide range of dimension estimation methods, categorising them by the geometric information they exploit.<n>The paper evaluates the performance of these methods, as well as investigating varying responses to curvature and noise.
arXiv Detail & Related papers (2025-07-18T13:05:42Z) - Analyzing Generative Models by Manifold Entropic Metrics [8.477943884416023]
We introduce a novel set of tractable information-theoretic evaluation metrics.<n>We compare various normalizing flow architectures and $beta$-VAEs on the EMNIST dataset.<n>The most interesting finding of our experiments is a ranking of model architectures and training procedures in terms of their inductive bias to converge to aligned and disentangled representations during training.
arXiv Detail & Related papers (2024-10-25T09:35:00Z) - A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models [6.647819824559201]
We study the large-sample properties of a likelihood-based approach for estimating conditional deep generative models.
Our results lead to the convergence rate of a sieve maximum likelihood estimator for estimating the conditional distribution.
arXiv Detail & Related papers (2024-10-02T20:46:21Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Conformal inference for regression on Riemannian Manifolds [45.560812800359685]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by $X$, lies in an Euclidean space.<n>We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Challenges to Evaluating the Generalization of Coreference Resolution Models: A Measurement Modeling Perspective [69.50044040291847]
We show how multi-dataset evaluations risk conflating different factors concerning what, precisely, is being measured.
This makes it difficult to draw more generalizable conclusions from these evaluations.
arXiv Detail & Related papers (2023-03-16T05:32:02Z) - Predicting Out-of-Domain Generalization with Neighborhood Invariance [59.05399533508682]
We propose a measure of a classifier's output invariance in a local transformation neighborhood.
Our measure is simple to calculate, does not depend on the test point's true label, and can be applied even in out-of-domain (OOD) settings.
In experiments on benchmarks in image classification, sentiment analysis, and natural language inference, we demonstrate a strong and robust correlation between our measure and actual OOD generalization.
arXiv Detail & Related papers (2022-07-05T14:55:16Z) - A geometric framework for outlier detection in high-dimensional data [0.0]
Outlier or anomaly detection is an important task in data analysis.
We provide a framework that exploits the metric structure of a data set.
We show that exploiting this structure significantly improves the detection of outlying observations in high-dimensional data.
arXiv Detail & Related papers (2022-07-01T12:07:51Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - High-Dimensional Multi-Task Averaging and Application to Kernel Mean
Embedding [0.0]
We propose an improved estimator for the multi-task averaging problem.
We prove theoretically that this approach provides a reduction in mean squared error.
An application of this approach is the estimation of multiple kernel mean embeddings.
arXiv Detail & Related papers (2020-11-13T07:31:30Z) - Effective Data-aware Covariance Estimator from Compressed Data [63.16042585506435]
We propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation.
We conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.
arXiv Detail & Related papers (2020-10-10T10:10:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.