Euler Characteristic Curves and Profiles: a stable shape invariant for
big data problems
- URL: http://arxiv.org/abs/2212.01666v2
- Date: Fri, 11 Aug 2023 18:06:47 GMT
- Title: Euler Characteristic Curves and Profiles: a stable shape invariant for
big data problems
- Authors: Pawe{\l} D{\l}otko and Davide Gurnari
- Abstract summary: We show efficient algorithms to compute Euler Characteristic based approaches for persistent homology.
Euler Curves and Profiles enjoys certain type of stability which makes them robust tool in data analysis.
- Score: 3.0023392750520883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tools of Topological Data Analysis provide stable summaries encapsulating the
shape of the considered data. Persistent homology, the most standard and well
studied data summary, suffers a number of limitations; its computations are
hard to distribute, it is hard to generalize to multifiltrations and is
computationally prohibitive for big data-sets. In this paper we study the
concept of Euler Characteristics Curves, for one parameter filtrations and
Euler Characteristic Profiles, for multi-parameter filtrations. While being a
weaker invariant in one dimension, we show that Euler Characteristic based
approaches do not possess some handicaps of persistent homology; we show
efficient algorithms to compute them in a distributed way, their generalization
to multifiltrations and practical applicability for big data problems. In
addition we show that the Euler Curves and Profiles enjoys certain type of
stability which makes them robust tool in data analysis. Lastly, to show their
practical applicability, multiple use-cases are considered.
Related papers
- Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Manifold Learning with Sparse Regularised Optimal Transport [0.17205106391379024]
Real-world datasets are subject to noisy observations and sampling, so that distilling information about the underlying manifold is a major challenge.
We propose a method for manifold learning that utilises a symmetric version of optimal transport with a quadratic regularisation.
We prove that the resulting kernel is consistent with a Laplace-type operator in the continuous limit, establish robustness to heteroskedastic noise and exhibit these results in simulations.
arXiv Detail & Related papers (2023-07-19T08:05:46Z) - A Framework for Fast and Stable Representations of Multiparameter
Persistent Homology Decompositions [2.76240219662896]
We introduce a new general representation framework that leverages recent results on em decompositions of multi parameter persistent homology.
We establish theoretical stability guarantees under this framework as well as efficient algorithms for practical computation.
We validate our stability results and algorithms with numerical experiments that demonstrate statistical convergence, prediction accuracy, and fast running times on several real data sets.
arXiv Detail & Related papers (2023-06-19T21:28:53Z) - Measuring dissimilarity with diffeomorphism invariance [94.02751799024684]
We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces.
We prove that DID enjoys properties which make it relevant for theoretical study and practical use.
arXiv Detail & Related papers (2022-02-11T13:51:30Z) - Robust learning of data anomalies with analytically-solvable entropic
outlier sparsification [0.0]
Outlier Sparsification (EOS) is proposed as a robust computational strategy for the detection of data anomalies.
The performance of EOS is compared to a range of commonly-used tools on synthetic problems and on partially-mislabeled supervised classification problems from biomedicine.
arXiv Detail & Related papers (2021-12-22T10:13:29Z) - Partial Counterfactual Identification from Observational and
Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data.
Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z) - Efficient Multidimensional Functional Data Analysis Using Marginal
Product Basis Systems [2.4554686192257424]
We propose a framework for learning continuous representations from a sample of multidimensional functional data.
We show that the resulting estimation problem can be solved efficiently by the tensor decomposition.
We conclude with a real data application in neuroimaging.
arXiv Detail & Related papers (2021-07-30T16:02:15Z) - Bayesian Quadrature on Riemannian Data Manifolds [79.71142807798284]
A principled way to model nonlinear geometric structure inherent in data is provided.
However, these operations are typically computationally demanding.
In particular, we focus on Bayesian quadrature (BQ) to numerically compute integrals over normal laws.
We show that by leveraging both prior knowledge and an active exploration scheme, BQ significantly reduces the number of required evaluations.
arXiv Detail & Related papers (2021-02-12T17:38:04Z) - Fitting very flexible models: Linear regression with large numbers of
parameters [0.0]
Linear fitting is used to generalize and denoising data.
We discuss how this basis-function fitting is done, with ordinary least squares and extensions thereof.
It is even possible to take the limit of infinite parameters, at which, if the basis and regularization are chosen correctly, the least-squares fit becomes the mean of a process.
arXiv Detail & Related papers (2021-01-15T21:08:34Z) - Overcoming the curse of dimensionality with Laplacian regularization in
semi-supervised learning [80.20302993614594]
We provide a statistical analysis to overcome drawbacks of Laplacian regularization.
We unveil a large body of spectral filtering methods that exhibit desirable behaviors.
We provide realistic computational guidelines in order to make our method usable with large amounts of data.
arXiv Detail & Related papers (2020-09-09T14:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.