Related papers: Fixed and adaptive landmark sets for finite pseudometric spaces

Fixed and adaptive landmark sets for finite pseudometric spaces

URL: http://arxiv.org/abs/2212.09826v1
Date: Mon, 19 Dec 2022 19:53:33 GMT
Title: Fixed and adaptive landmark sets for finite pseudometric spaces
Authors: Jason Cory Brunson and Yara Skaf
Abstract summary: "Lastfirst" based on ranked distances implies a cover comprising sets of uniform cardinality. We perform benchmark tests and compare its performance to that of maxmin on feature detection and class prediction tasks. We find that lastfirst achieves comparable performance on prediction tasks and outperforms maxmin on homology detection tasks.
Score: 0.9137554315375919
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Topological data analysis (TDA) is an expanding field that leverages principles and tools from algebraic topology to quantify structural features of data sets or transform them into more manageable forms. As its theoretical foundations have been developed, TDA has shown promise in extracting useful information from high-dimensional, noisy, and complex data such as those used in biomedicine. To operate efficiently, these techniques may employ landmark samplers, either random or heuristic. The heuristic maxmin procedure obtains a roughly even distribution of sample points by implicitly constructing a cover comprising sets of uniform radius. However, issues arise with data that vary in density or include points with multiplicities, as are common in biomedicine. We propose an analogous procedure, "lastfirst" based on ranked distances, which implies a cover comprising sets of uniform cardinality. We first rigorously define the procedure and prove that it obtains landmarks with desired properties. We then perform benchmark tests and compare its performance to that of maxmin, on feature detection and class prediction tasks involving simulated and real-world biomedical data. Lastfirst is more general than maxmin in that it can be applied to any data on which arbitrary (and not necessarily symmetric) pairwise distances can be computed. Lastfirst is more computationally costly, but our implementation scales at the same rate as maxmin. We find that lastfirst achieves comparable performance on prediction tasks and outperforms maxmin on homology detection tasks. Where the numerical values of similarity measures are not meaningful, as in many biomedical contexts, lastfirst sampling may also improve interpretability.

Related papers

Bias Detection via Maximum Subgroup Discrepancy [2.236957801565796]
We propose a new notion of distance, the Maximum Subgroup Discrepancy (MSD) In this metric, two distributions are close if, roughly, discrepancies are low for all feature subgroups. We provide a practical algorithm for the evaluation of the distance, based on Mixed-integer optimization (MIO)
arXiv Detail & Related papers (2025-02-04T11:01:03Z)
Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance. We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks. We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z)
Automating Data Science Pipelines with Tensor Completion [4.956678070210018]
We model data science pipelines as instances of tensor completion. The goal is to identify all missing entries of the tensor, corresponding to all combinations of variable values. We extensively evaluate existing and proposed methods in a number of datasets.
arXiv Detail & Related papers (2024-10-08T22:34:08Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
Topological Quality of Subsets via Persistence Matching Diagrams [0.196629787330046]
We measure the quality of a subset concerning the dataset it represents using topological data analysis techniques. In particular, this approach enables us to explain why the chosen subset is likely to result in poor performance of a supervised learning model.
arXiv Detail & Related papers (2023-06-04T17:08:41Z)
Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences. Exact methods yield better classification performance, but they pose high computational costs. We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
arXiv Detail & Related papers (2022-09-11T22:44:19Z)
A Fair Experimental Comparison of Neural Network Architectures for Latent Representations of Multi-Omics for Drug Response Prediction [7.690774882108066]
We train and optimize multi-omics integration methods under equal conditions. We devised a novel method, Omics Stacking, that combines the advantages of intermediate and late integration. Experiments were conducted on a public drug response data set with multiple omics data.
arXiv Detail & Related papers (2022-08-31T12:46:08Z)
Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms. The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm. As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z)
Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.