Related papers: Resampling and averaging coordinates on data

Resampling and averaging coordinates on data

URL: http://arxiv.org/abs/2408.01379v1
Date: Fri, 2 Aug 2024 16:37:33 GMT
Title: Resampling and averaging coordinates on data
Authors: Andrew J. Blumberg, Mathieu Carriere, Jun Hou Fung, Michael A. Mandell,
Abstract summary: We introduce algorithms for robustly computing intrinsic coordinates on point clouds. We identify a subset of representative embeddings by clustering the collection of candidate coordinates and using shape descriptors from topological data analysis. The final output is the embedding obtained as an average of the representative embeddings using generalized Procrustes analysis.
Score: 1.660242118349614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce algorithms for robustly computing intrinsic coordinates on point clouds. Our approach relies on generating many candidate coordinates by subsampling the data and varying hyperparameters of the embedding algorithm (e.g., manifold learning). We then identify a subset of representative embeddings by clustering the collection of candidate coordinates and using shape descriptors from topological data analysis. The final output is the embedding obtained as an average of the representative embeddings using generalized Procrustes analysis. We validate our algorithm on both synthetic data and experimental measurements from genomics, demonstrating robustness to noise and outliers.

Related papers

Subsampling, aligning, and averaging to find circular coordinates in recurrent time series [3.8214695776749013]
We introduce a new algorithm for finding robust circular coordinates on data that is expected to exhibit recurrence. We validate our technique on both synthetic data sets and neuronal activity recordings.
arXiv Detail & Related papers (2024-12-24T15:52:51Z)
Noncommutative Model Selection for Data Clustering and Dimension Reduction Using Relative von Neumann Entropy [0.0]
We propose a pair of data-driven algorithms for unsupervised classification and dimension reduction. In our experiments, our clustering algorithm outperforms $k$-means clustering on data sets with non-trivial geometry and topology.
arXiv Detail & Related papers (2024-11-29T18:04:11Z)
Samplet basis pursuit: Multiresolution scattered data approximation with sparsity constraints [0.0]
We consider scattered data approximation in samplet coordinates with $ell_1$-regularization. By using the Riesz isometry, we embed samplets into reproducing kernel Hilbert spaces. We argue that the class of signals that are sparse with respect to the embedded samplet basis is considerably larger than the class of signals that are sparse with respect to the basis of kernel translates.
arXiv Detail & Related papers (2023-06-16T21:20:49Z)
Topological Quality of Subsets via Persistence Matching Diagrams [0.196629787330046]
We measure the quality of a subset concerning the dataset it represents using topological data analysis techniques. In particular, this approach enables us to explain why the chosen subset is likely to result in poor performance of a supervised learning model.
arXiv Detail & Related papers (2023-06-04T17:08:41Z)
Data Clustering as an Emergent Consensus of Autonomous Agents [0.0]
We present a data segmentation method based on a first-order density-induced consensus protocol. We provide a mathematically rigorous analysis of the consensus model leading to the stopping criteria of the data segmentation.
arXiv Detail & Related papers (2022-04-22T09:11:35Z)
Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms. The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm. As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z)
A Gradient Sampling Algorithm for Stratified Maps with Applications to Topological Data Analysis [0.0]
We introduce a novel gradient descent algorithm extending the well-known Gradient Sampling methodology. We then apply our method to objective functions based on the persistent homology map computed over lower-star filters.
arXiv Detail & Related papers (2021-09-01T14:07:44Z)
Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z)
Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms. DPPs favor diversity of the center points within subsets. We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z)
Learning High Dimensional Wasserstein Geodesics [55.086626708837635]
We propose a new formulation and learning strategy for computing the Wasserstein geodesic between two probability distributions in high dimensions. By applying the method of Lagrange multipliers to the dynamic formulation of the optimal transport (OT) problem, we derive a minimax problem whose saddle point is the Wasserstein geodesic. We then parametrize the functions by deep neural networks and design a sample based bidirectional learning algorithm for training.
arXiv Detail & Related papers (2021-02-05T04:25:28Z)
Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets. Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.