Persistent Homology for High-dimensional Data Based on Spectral Methods
- URL: http://arxiv.org/abs/2311.03087v3
- Date: Thu, 31 Oct 2024 11:01:08 GMT
- Title: Persistent Homology for High-dimensional Data Based on Spectral Methods
- Authors: Sebastian Damrich, Philipp Berens, Dmitry Kobak,
- Abstract summary: We show that persistent homology becomes very sensitive to noise and fails to detect the correct topology.
We find that spectral distances on the k-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise.
- Score: 16.58218530585593
- License:
- Abstract: Persistent homology is a popular computational tool for analyzing the topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case traditional persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for existing refinements of persistent homology. As a remedy, we find that spectral distances on the k-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise. Moreover, we derive a novel closed-form formula for effective resistance, and describe its relation to diffusion distances. Finally, we apply these methods to high-dimensional single-cell RNA-sequencing data and show that spectral distances allow robust detection of cell cycle loops.
Related papers
- Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Non-isotropic Persistent Homology: Leveraging the Metric Dependency of
PH [5.70896453969985]
We show that information on the point cloud is lost when restricting persistent homology to a single distance function.
We numerically show that non-isotropic persistent homology can extract information on orientation, orientational variance, and scaling of randomly generated point clouds.
arXiv Detail & Related papers (2023-10-25T08:03:17Z) - A Heat Diffusion Perspective on Geodesic Preserving Dimensionality
Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.
Results show that our method outperforms existing state of the art in preserving ground truth manifold distances.
We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z) - Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse
Problems [64.29491112653905]
We propose a novel and efficient diffusion sampling strategy that synergistically combines the diffusion sampling and Krylov subspace methods.
Specifically, we prove that if tangent space at a denoised sample by Tweedie's formula forms a Krylov subspace, then the CG with the denoised data ensures the data consistency update to remain in the tangent space.
Our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.
arXiv Detail & Related papers (2023-03-10T07:42:49Z) - Learning Topology-Preserving Data Representations [9.710409273484464]
We propose a method for learning topology-preserving data representations (dimensionality reduction)
The core of the method is the minimization of the Representation Topology Divergence (RTD) between original high-dimensional data and low-dimensional representation in latent space.
The proposed method better preserves the global structure and topology of the data manifold than state-of-the-art competitors as measured by linear correlation, triplet distance ranking accuracy, and Wasserstein distance between persistence barcodes.
arXiv Detail & Related papers (2023-01-31T22:55:04Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - Tight basis cycle representatives for persistent homology of large data
sets [0.0]
Persistent homology (PH) is a popular tool for topological data analysis that has found applications across diverse areas of research.
Although powerful in theory, PH suffers from high computation cost that precludes its application to large data sets.
We provide a strategy and algorithms to compute tight representative boundaries around nontrivial robust features in large data sets.
arXiv Detail & Related papers (2022-06-06T22:00:42Z) - Time-inhomogeneous diffusion geometry and topology [69.55228523791897]
Diffusion condensation is a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data.
We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives.
Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.
arXiv Detail & Related papers (2022-03-28T16:06:17Z) - Improving Metric Dimensionality Reduction with Distributed Topology [68.8204255655161]
DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term.
We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
arXiv Detail & Related papers (2021-06-14T17:19:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.