Related papers: Computable Stability for Persistence Rank Function Machine Learning

Computable Stability for Persistence Rank Function Machine Learning

URL: http://arxiv.org/abs/2307.02904v1
Date: Thu, 6 Jul 2023 10:34:52 GMT
Title: Computable Stability for Persistence Rank Function Machine Learning
Authors: Qiquan Wang, In\'es Garc\'ia-Redondo, Pierre Faug\`ere, Anthea Monod, Gregory Henselman-Petrusek
Abstract summary: We study the performance of rank functions in functional inferential statistics and machine learning on both simulated and real data. We find that the use of persistent homology captured by rank functions offers a clear improvement over existing approaches.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Persistent homology barcodes and diagrams are a cornerstone of topological data analysis. Widely used in many real data settings, they relate variation in topological information (as measured by cellular homology) with variation in data, however, they are challenging to use in statistical settings due to their complex geometric structure. In this paper, we revisit the persistent homology rank function -- an invariant measure of ``shape" that was introduced before barcodes and persistence diagrams and captures the same information in a form that is more amenable to data and computation. In particular, since they are functions, techniques from functional data analysis -- a domain of statistics adapted for functions -- apply directly to persistent homology when represented by rank functions. Rank functions, however, have been less popular than barcodes because they face the challenge that stability -- a property that is crucial to validate their use in data analysis -- is difficult to guarantee, mainly due to metric concerns on rank function space. However, rank functions extend more naturally to the increasingly popular and important case of multiparameter persistent homology. In this paper, we study the performance of rank functions in functional inferential statistics and machine learning on both simulated and real data, and in both single and multiparameter persistent homology. We find that the use of persistent homology captured by rank functions offers a clear improvement over existing approaches. We then provide theoretical justification for our numerical experiments and applications to data by deriving several stability results for single- and multiparameter persistence rank functions under various metrics with the underlying aim of computational feasibility and interpretability.

Related papers

Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks. We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z)
Functional relevance based on the continuous Shapley value [0.0]
This work focuses on interpretability of predictive models based on functional data. We propose an interpretability method based on the Shapley value for continuous games. The method is illustrated through a set of experiments with simulated and real data sets.
arXiv Detail & Related papers (2024-11-27T18:20:00Z)
SymbolFit: Automatic Parametric Modeling with Symbolic Regression [1.2662552408022727]
We introduce SymbolFit, a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data. Our approach is demonstrated in data analysis applications in high-energy physics experiments at the CERN Large Hadron Collider.
arXiv Detail & Related papers (2024-11-15T00:09:37Z)
Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.<n>Existing update gradient would heavily destroy the performance on previous datasets during CIT process.<n>We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z)
ASGNet: Adaptive Semantic Gate Networks for Log-Based Anomaly Diagnosis [6.399472066185473]
We propose an adaptive semantic gate networks (ASGNet) that combines statistical features and semantic features to consolidate log text semantic representation. ASGNet encodes statistical features via a variational encoding module and fuses useful information through a well-designed adaptive semantic threshold mechanism.
arXiv Detail & Related papers (2024-02-19T05:08:44Z)
A Functional approach for Two Way Dimension Reduction in Time Series [13.767812547998735]
We propose a non-linear function-on-function approach, which consists of a functional encoder and a functional decoder. Our approach gives a low dimension latent representation by reducing the number of functional features as well as the timepoints at which the functions are observed.
arXiv Detail & Related papers (2023-01-01T06:09:15Z)
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications. We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA) Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z)
Robust Topological Inference in the Presence of Outliers [18.6112824677157]
The distance function to a compact set plays a crucial role in the paradigm of topological data analysis. Despite its stability to perturbations in the Hausdorff distance, persistent homology is highly sensitive to outliers. We propose a $textitmedian-of-means$ variant of the distance function ($textsfMoM Dist$), and establish its statistical properties.
arXiv Detail & Related papers (2022-06-03T19:45:43Z)
Data-Driven Reachability analysis and Support set Estimation with Christoffel Functions [8.183446952097528]
We present algorithms for estimating the forward reachable set of a dynamical system. The produced estimate is the sublevel set of a function called an empirical inverse Christoffel function. In addition to reachability analysis, the same approach can be applied to general problems of estimating the support of a random variable.
arXiv Detail & Related papers (2021-12-18T20:25:34Z)
Learning PSD-valued functions using kernel sums-of-squares [94.96262888797257]
We introduce a kernel sum-of-squares model for functions that take values in the PSD cone. We show that it constitutes a universal approximator of PSD functions, and derive eigenvalue bounds in the case of subsampled equality constraints. We then apply our results to modeling convex functions, by enforcing a kernel sum-of-squares representation of their Hessian.
arXiv Detail & Related papers (2021-11-22T16:07:50Z)
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies [88.0813215220342]
Some modalities can more easily contribute to the classification results than others. We develop a method based on the log-Sobolev inequality, which bounds the functional entropy with the functional-Fisher-information. On the two challenging multi-modal datasets VQA-CPv2 and SocialIQ, we obtain state-of-the-art results while more uniformly exploiting the modalities.
arXiv Detail & Related papers (2020-10-21T07:40:33Z)
Estimating Structural Target Functions using Machine Learning and Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models. This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics. We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
Tracking Performance of Online Stochastic Learners [57.14673504239551]
Online algorithms are popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches. When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy. We establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models.
arXiv Detail & Related papers (2020-04-04T14:16:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.