Computable Stability for Persistence Rank Function Machine Learning
- URL: http://arxiv.org/abs/2307.02904v1
- Date: Thu, 6 Jul 2023 10:34:52 GMT
- Title: Computable Stability for Persistence Rank Function Machine Learning
- Authors: Qiquan Wang, In\'es Garc\'ia-Redondo, Pierre Faug\`ere, Anthea Monod,
Gregory Henselman-Petrusek
- Abstract summary: We study the performance of rank functions in functional inferential statistics and machine learning on both simulated and real data.
We find that the use of persistent homology captured by rank functions offers a clear improvement over existing approaches.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Persistent homology barcodes and diagrams are a cornerstone of topological
data analysis. Widely used in many real data settings, they relate variation in
topological information (as measured by cellular homology) with variation in
data, however, they are challenging to use in statistical settings due to their
complex geometric structure. In this paper, we revisit the persistent homology
rank function -- an invariant measure of ``shape" that was introduced before
barcodes and persistence diagrams and captures the same information in a form
that is more amenable to data and computation. In particular, since they are
functions, techniques from functional data analysis -- a domain of statistics
adapted for functions -- apply directly to persistent homology when represented
by rank functions. Rank functions, however, have been less popular than
barcodes because they face the challenge that stability -- a property that is
crucial to validate their use in data analysis -- is difficult to guarantee,
mainly due to metric concerns on rank function space. However, rank functions
extend more naturally to the increasingly popular and important case of
multiparameter persistent homology. In this paper, we study the performance of
rank functions in functional inferential statistics and machine learning on
both simulated and real data, and in both single and multiparameter persistent
homology. We find that the use of persistent homology captured by rank
functions offers a clear improvement over existing approaches. We then provide
theoretical justification for our numerical experiments and applications to
data by deriving several stability results for single- and multiparameter
persistence rank functions under various metrics with the underlying aim of
computational feasibility and interpretability.
Related papers
- A Framework for Fast and Stable Representations of Multiparameter
Persistent Homology Decompositions [2.76240219662896]
We introduce a new general representation framework that leverages recent results on em decompositions of multi parameter persistent homology.
We establish theoretical stability guarantees under this framework as well as efficient algorithms for practical computation.
We validate our stability results and algorithms with numerical experiments that demonstrate statistical convergence, prediction accuracy, and fast running times on several real data sets.
arXiv Detail & Related papers (2023-06-19T21:28:53Z) - Basis Function Encoding of Numerical Features in Factorization Machines
for Improved Accuracy [2.3022070933226217]
We provide a systematic and theoretically-justified way to incorporate numerical features into FM variants.
We show that our technique yields a model that learns segmentized functions of the numerical feature spanned by the set of functions of one's choice.
Our technique preserves fast training and inference, and requires only a small modification of the computational graph of an FM model.
arXiv Detail & Related papers (2023-05-23T21:10:17Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Robust Topological Inference in the Presence of Outliers [18.6112824677157]
The distance function to a compact set plays a crucial role in the paradigm of topological data analysis.
Despite its stability to perturbations in the Hausdorff distance, persistent homology is highly sensitive to outliers.
We propose a $textitmedian-of-means$ variant of the distance function ($textsfMoM Dist$), and establish its statistical properties.
arXiv Detail & Related papers (2022-06-03T19:45:43Z) - Data-Driven Reachability analysis and Support set Estimation with
Christoffel Functions [8.183446952097528]
We present algorithms for estimating the forward reachable set of a dynamical system.
The produced estimate is the sublevel set of a function called an empirical inverse Christoffel function.
In addition to reachability analysis, the same approach can be applied to general problems of estimating the support of a random variable.
arXiv Detail & Related papers (2021-12-18T20:25:34Z) - Smart Vectorizations for Single and Multiparameter Persistence [8.504400925390296]
We introduce two new topological summaries for single and multi-persistence persistence, namely, saw functions and multi-persistence grid functions.
These new topological summaries can be regarded as the complexity measures of the evolving subspaces determined by the filtration.
We derive theoretical guarantees on the stability of the new saw and multi-persistence grid functions and illustrate their applicability for graph classification tasks.
arXiv Detail & Related papers (2021-04-10T15:09:31Z) - Removing Bias in Multi-modal Classifiers: Regularization by Maximizing
Functional Entropies [88.0813215220342]
Some modalities can more easily contribute to the classification results than others.
We develop a method based on the log-Sobolev inequality, which bounds the functional entropy with the functional-Fisher-information.
On the two challenging multi-modal datasets VQA-CPv2 and SocialIQ, we obtain state-of-the-art results while more uniformly exploiting the modalities.
arXiv Detail & Related papers (2020-10-21T07:40:33Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Tracking Performance of Online Stochastic Learners [57.14673504239551]
Online algorithms are popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches.
When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy.
We establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models.
arXiv Detail & Related papers (2020-04-04T14:16:27Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.