Tangent Space and Dimension Estimation with the Wasserstein Distance
- URL: http://arxiv.org/abs/2110.06357v4
- Date: Mon, 25 Sep 2023 10:01:42 GMT
- Title: Tangent Space and Dimension Estimation with the Wasserstein Distance
- Authors: Uzu Lim, Harald Oberhauser, Vidit Nanda
- Abstract summary: Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space.
We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold.
- Score: 10.118241139691952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider a set of points sampled independently near a smooth compact
submanifold of Euclidean space. We provide mathematically rigorous bounds on
the number of sample points required to estimate both the dimension and the
tangent spaces of that manifold with high confidence. The algorithm for this
estimation is Local PCA, a local version of principal component analysis. Our
results accommodate for noisy non-uniform data distribution with the noise that
may vary across the manifold, and allow simultaneous estimation at multiple
points. Crucially, all of the constants appearing in our bound are explicitly
described. The proof uses a matrix concentration inequality to estimate
covariance matrices and a Wasserstein distance bound for quantifying
nonlinearity of the underlying manifold and non-uniformity of the probability
measure.
Related papers
- Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Sampling and estimation on manifolds using the Langevin diffusion [45.57801520690309]
Two estimators of linear functionals of $mu_phi $ based on the discretized Markov process are considered.
Error bounds are derived for sampling and estimation using a discretization of an intrinsically defined Langevin diffusion.
arXiv Detail & Related papers (2023-12-22T18:01:11Z) - Intrinsic Bayesian Cramér-Rao Bound with an Application to Covariance Matrix Estimation [49.67011673289242]
This paper presents a new performance bound for estimation problems where the parameter to estimate lies in a smooth manifold.
It induces a geometry for the parameter manifold, as well as an intrinsic notion of the estimation error measure.
arXiv Detail & Related papers (2023-11-08T15:17:13Z) - Fermat Distances: Metric Approximation, Spectral Convergence, and
Clustering Algorithms [7.07321040534471]
We prove that Fermat distances converge to their continuum analogues in small neighborhoods with a precise rate that depends on the intrinsic dimensionality of the data.
Our results are then used to prove that discrete graph Laplacians based on discrete, sample-driven Fermat distances converge to corresponding continuum operators.
The perspective afforded by our discrete-to-continuum Fermat distance analysis leads to new clustering algorithms for data.
arXiv Detail & Related papers (2023-07-07T16:36:00Z) - Mean-Square Analysis of Discretized It\^o Diffusions for Heavy-tailed
Sampling [17.415391025051434]
We analyze the complexity of sampling from a class of heavy-tailed distributions by discretizing a natural class of Ito diffusions associated with weighted Poincar'e inequalities.
Based on a mean-square analysis, we establish the iteration complexity for obtaining a sample whose distribution is $epsilon$ close to the target distribution in the Wasserstein-2 metric.
arXiv Detail & Related papers (2023-03-01T15:16:03Z) - Nonlinear Sufficient Dimension Reduction for
Distribution-on-Distribution Regression [9.086237593805173]
We introduce a new approach to nonlinear sufficient dimension reduction in cases where both the predictor and the response are distributional data.
Our key step is to build universal kernels (cc-universal) on the metric spaces.
arXiv Detail & Related papers (2022-07-11T04:11:36Z) - Kernel distance measures for time series, random fields and other
structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data.
It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution.
Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z) - Spatially relaxed inference on high-dimensional linear models [48.989769153211995]
We study the properties of ensembled clustered inference algorithms which combine spatially constrained clustering, statistical inference, and ensembling to aggregate several clustered inference solutions.
We show that ensembled clustered inference algorithms control the $delta$-FWER under standard assumptions for $delta$ equal to the largest cluster diameter.
arXiv Detail & Related papers (2021-06-04T16:37:19Z) - Depth-based pseudo-metrics between probability distributions [1.1470070927586016]
We propose two new pseudo-metrics between continuous probability measures based on data depth and its associated central regions.
In contrast to the Wasserstein distance, the proposed pseudo-metrics do not suffer from the curse of dimensionality.
The regions-based pseudo-metric appears to be robust w.r.t. both outliers and heavy tails.
arXiv Detail & Related papers (2021-03-23T17:33:18Z) - Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains.
One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.