Smart Vectorizations for Single and Multiparameter Persistence
- URL: http://arxiv.org/abs/2104.04787v1
- Date: Sat, 10 Apr 2021 15:09:31 GMT
- Title: Smart Vectorizations for Single and Multiparameter Persistence
- Authors: Baris Coskunuzer and CUneyt Gurcan Akcora and Ignacio Segovia
Dominguez and Zhiwei Zhen and Murat Kantarcioglu and Yulia R. Gel
- Abstract summary: We introduce two new topological summaries for single and multi-persistence persistence, namely, saw functions and multi-persistence grid functions.
These new topological summaries can be regarded as the complexity measures of the evolving subspaces determined by the filtration.
We derive theoretical guarantees on the stability of the new saw and multi-persistence grid functions and illustrate their applicability for graph classification tasks.
- Score: 8.504400925390296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The machinery of topological data analysis becomes increasingly popular in a
broad range of machine learning tasks, ranging from anomaly detection and
manifold learning to graph classification. Persistent homology is one of the
key approaches here, allowing us to systematically assess the evolution of
various hidden patterns in the data as we vary a scale parameter. The extracted
patterns, or homological features, along with information on how long such
features persist throughout the considered filtration of a scale parameter,
convey a critical insight into salient data characteristics and data
organization.
In this work, we introduce two new and easily interpretable topological
summaries for single and multi-parameter persistence, namely, saw functions and
multi-persistence grid functions, respectively. Compared to the existing
topological summaries which tend to assess the numbers of topological features
and/or their lifespans at a given filtration step, our proposed saw and
multi-persistence grid functions allow us to explicitly account for essential
complementary information such as the numbers of births and deaths at each
filtration step.
These new topological summaries can be regarded as the complexity measures of
the evolving subspaces determined by the filtration and are of particular
utility for applications of persistent homology on graphs. We derive
theoretical guarantees on the stability of the new saw and multi-persistence
grid functions and illustrate their applicability for graph classification
tasks.
Related papers
- GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping [0.24578723416255746]
Feature selection assumes a pivotal role in enhancing model interpretability.
The accuracy gained from aggregating decision trees comes at the expense of interpretability.
The study introduces novel methods to construct feature graphs from unsupervised random forests.
arXiv Detail & Related papers (2024-04-27T12:47:37Z) - Fundamental limits of community detection from multi-view data:
multi-layer, dynamic and partially labeled block models [7.778975741303385]
We study community detection in multi-view data in modern network analysis.
We characterize the mutual information between the data and the latent parameters.
We introduce iterative algorithms based on Approximate Message Passing for community detection.
arXiv Detail & Related papers (2024-01-16T07:13:32Z) - Discrete transforms of quantized persistence diagrams [0.5249805590164902]
We introduce Qupid, a novel and simple method for vectorizing persistence diagrams.
Key features are the choice of log-scaled grids that emphasize information contained near the diagonal in persistence diagrams.
We conduct an in-depth experimental analysis of Qupid, showing that the simplicity of our method results in very low computational costs.
arXiv Detail & Related papers (2023-12-28T16:11:11Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - Learning Conditional Invariance through Cycle Consistency [60.85059977904014]
We propose a novel approach to identify meaningful and independent factors of variation in a dataset.
Our method involves two separate latent subspaces for the target property and the remaining input information.
We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models.
arXiv Detail & Related papers (2021-11-25T17:33:12Z) - Novel Features for Time Series Analysis: A Complex Networks Approach [62.997667081978825]
Time series data are ubiquitous in several domains as climate, economics and health care.
Recent conceptual approach relies on time series mapping to complex networks.
Network analysis can be used to characterize different types of time series.
arXiv Detail & Related papers (2021-10-11T13:46:28Z) - Can neural networks learn persistent homology features? [1.1816942730023885]
Topological data analysis uses tools from topology to create representations of data.
In our work, we explore the possibility of learning several types of features extracted from persistence diagrams using neural networks.
arXiv Detail & Related papers (2020-11-30T10:58:53Z) - Capturing Dynamics of Time-Varying Data via Topology [0.5276232626689568]
We introduce a new tool to summarize time-varying metric spaces: a crocker stack.
A time-varying collection of metric spaces as formed by a moving school of fish or flock of birds can contain a vast amount of information.
We demonstrate the utility of crocker stacks for a parameter identification task involving an influential model of biological aggregations.
arXiv Detail & Related papers (2020-10-07T20:07:40Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.