Manifold-adaptive dimension estimation revisited
- URL: http://arxiv.org/abs/2008.03221v2
- Date: Mon, 10 Aug 2020 10:04:22 GMT
- Title: Manifold-adaptive dimension estimation revisited
- Authors: Zsigmond Benk\H{o}, Marcell Stippinger, Roberta Rehus, Attila Bencze,
D\'aniel Fab\'o, Bogl\'arka Hajnal, Lor\'and Er\H{o}ss, Andr\'as Telcs,
Zolt\'an Somogyv\'ari
- Abstract summary: We revisit and improve the manifold-adaptive Farahmand-Szepesv'ari-Audibert dimension estimator.
We compute the probability density function of local FSA estimates.
We derive the maximum likelihood formula for global intrinsic dimensionality.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data dimensionality informs us about data complexity and sets limit on the
structure of successful signal processing pipelines. In this work we revisit
and improve the manifold-adaptive Farahmand-Szepesv\'ari-Audibert (FSA)
dimension estimator, making it one of the best nearest neighbor-based dimension
estimators available. We compute the probability density function of local FSA
estimates, if the local manifold density is uniform. Based on the probability
density function, we propose to use the median of local estimates as a basic
global measure of intrinsic dimensionality, and we demonstrate the advantages
of this asymptotically unbiased estimator over the previously proposed
statistics: the mode and the mean. Additionally, from the probability density
function, we derive the maximum likelihood formula for global intrinsic
dimensionality, if i.i.d. holds. We tackle edge and finite-sample effects with
an exponential correction formula, calibrated on hypercube datasets. We compare
the performance of the corrected-median-FSA estimator with kNN estimators:
maximum likelihood (ML, Levina-Bickel) and two implementations of DANCo (R and
matlab). We show that corrected-median-FSA estimator beats the ML estimator and
it is on equal footing with DANCo for standard synthetic benchmarks according
to mean percentage error and error rate metrics. With the median-FSA algorithm,
we reveal diverse changes in the neural dynamics while resting state and during
epileptic seizures. We identify brain areas with lower-dimensional dynamics
that are possible causal sources and candidates for being seizure onset zones.
Related papers
- Bayesian Estimation and Tuning-Free Rank Detection for Probability Mass Function Tensors [17.640500920466984]
This paper presents a novel framework for estimating the joint PMF and automatically inferring its rank from observed data.
We derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Additionally, we develop a scalable version of the VI-based approach by leveraging variational inference (SVI)
Experiments involving both synthetic data and real movie recommendation data illustrate the advantages of our VI and SVI-based methods in terms of estimation accuracy, automatic rank detection, and computational efficiency.
arXiv Detail & Related papers (2024-10-08T20:07:49Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Beyond EM Algorithm on Over-specified Two-Component Location-Scale
Gaussian Mixtures [29.26015093627193]
We develop the Exponential Location Update (ELU) algorithm to efficiently explore the curvature of the negative log-likelihood functions.
We demonstrate that the ELU algorithm converges to the final statistical radius of the models after a logarithmic number of iterations.
arXiv Detail & Related papers (2022-05-23T06:49:55Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Fast Batch Nuclear-norm Maximization and Minimization for Robust Domain
Adaptation [154.2195491708548]
We study the prediction discriminability and diversity by studying the structure of the classification output matrix of a randomly selected data batch.
We propose Batch Nuclear-norm Maximization and Minimization, which performs nuclear-norm on the target output matrix to enhance the target prediction ability.
Experiments show that our method could boost the adaptation accuracy and robustness under three typical domain adaptation scenarios.
arXiv Detail & Related papers (2021-07-13T15:08:32Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Data-driven prediction of multistable systems from sparse measurements [0.0]
We develop a data-driven method, based on semi-supervised classification, to predict the state of multistable systems.
We introduce a sparsity-promoting metric-learning (SPML) optimization, which learns a metric directly from the precomputed data.
We demonstrate the application of this method on two multistable systems.
arXiv Detail & Related papers (2020-10-28T02:23:05Z) - Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators
with Massive Data [20.79270369203348]
Existing methods mostly focus on subsampling with replacement due to its high computational efficiency.
We first derive optimal subsampling probabilities in the context of quasi-likelihood estimation.
We develop a distributed subsampling framework, in which statistics are computed simultaneously on smaller partitions of the full data.
arXiv Detail & Related papers (2020-05-21T02:46:56Z) - Statistical Inference for Model Parameters in Stochastic Gradient
Descent [45.29532403359099]
gradient descent coefficients (SGD) has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency.
We investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain conditions.
arXiv Detail & Related papers (2016-10-27T07:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.