TDAvec: Computing Vector Summaries of Persistence Diagrams for Topological Data Analysis in R and Python
- URL: http://arxiv.org/abs/2411.17340v1
- Date: Tue, 26 Nov 2024 11:34:12 GMT
- Title: TDAvec: Computing Vector Summaries of Persistence Diagrams for Topological Data Analysis in R and Python
- Authors: Aleksei Luchinsky, Umar Islambekov,
- Abstract summary: We introduce a new software package designed to streamline the vectorization of persistence diagrams (PDs)
The non-Hilbert nature of the space of PDs poses challenges for their direct use in machine learning applications.
- Score: 0.6445605125467574
- License:
- Abstract: Persistent homology is a widely-used tool in topological data analysis (TDA) for understanding the underlying shape of complex data. By constructing a filtration of simplicial complexes from data points, it captures topological features such as connected components, loops, and voids across multiple scales. These features are encoded in persistence diagrams (PDs), which provide a concise summary of the data's topological structure. However, the non-Hilbert nature of the space of PDs poses challenges for their direct use in machine learning applications. To address this, kernel methods and vectorization techniques have been developed to transform PDs into machine-learning-compatible formats. In this paper, we introduce a new software package designed to streamline the vectorization of PDs, offering an intuitive workflow and advanced functionalities. We demonstrate the necessity of the package through practical examples and provide a detailed discussion on its contributions to applied TDA. Definitions of all vectorization summaries used in the package are included in the appendix.
Related papers
- IsUMap: Manifold Learning and Data Visualization leveraging Vietoris-Rips filtrations [0.08796261172196743]
We present a systematic and detailed construction of a metric representation for locally distorted metric spaces.
Our approach addresses limitations in existing methods by accommodating non-uniform data distributions and intricate local geometries.
arXiv Detail & Related papers (2024-07-25T07:46:30Z) - Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.
Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity.
We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models [65.82630283336051]
We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models.
We present a simple fix to this problem by constructing processes that fully exploit the structures, hence the name ComboStoc.
arXiv Detail & Related papers (2024-05-22T15:23:10Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - Higher-order topological kernels via quantum computation [68.8204255655161]
Topological data analysis (TDA) has emerged as a powerful tool for extracting meaningful insights from complex data.
We propose a quantum approach to defining Betti kernels, which is based on constructing Betti curves with increasing order.
arXiv Detail & Related papers (2023-07-14T14:48:52Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Learning Implicit Feature Alignment Function for Semantic Segmentation [51.36809814890326]
Implicit Feature Alignment function (IFA) is inspired by the rapidly expanding topic of implicit neural representations.
We show that IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions.
Our method can be combined with improvement on various architectures, and it achieves state-of-the-art accuracy trade-off on common benchmarks.
arXiv Detail & Related papers (2022-06-17T09:40:14Z) - A computationally efficient framework for vector representation of
persistence diagrams [0.0]
We propose a framework to convert a persistence diagram (PD) into a vector in $mathbbRn$, called a vectorized persistence block (VPB)
Our representation possesses many of the desired properties of vector-based summaries such as stability with respect to input noise, low computational cost and flexibility.
arXiv Detail & Related papers (2021-09-16T22:02:35Z) - Random Persistence Diagram Generation [4.435094091999926]
Topological data analysis (TDA) studies the shape patterns of data.
Persistent homology (PH) is a widely used method in TDA that summarizes homological features of data at multiple scales and stores this in persistence diagrams (PDs)
We propose random persistence diagram generation (RPDG), a method that generates a sequence of random PDs from the ones produced by the data.
arXiv Detail & Related papers (2021-04-15T19:33:01Z) - The Interconnectivity Vector: A Finite-Dimensional Vector Representation
of Persistent Homology [2.741266294612776]
Persistent Homology (PH) is a useful tool to study the underlying structure of a data set.
Persistence Diagrams (PDs) are a concise summary of the information found by studying the PH of a data set.
We propose a new finite-dimensional vector, called the interconnectivity vector, representation of a PD adapted from Bag-of-Words (BoW)
arXiv Detail & Related papers (2020-11-23T17:43:06Z) - A Short Review on Data Modelling for Vector Fields [5.51641435875237]
Machine learning methods have proven highly successful in dealing with a wide variety of data analysis and analytics tasks.
The recent success of end-to-end modelling scheme using deep neural networks allows the extension to more sophisticated and structured practical data.
This review article is dedicated to recent computational tools of vector fields, including vector data representations, predictive model of spatial data, as well as applications in computer vision, signal processing, and empirical sciences.
arXiv Detail & Related papers (2020-09-01T17:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.