Dory: Overcoming Barriers to Computing Persistent Homology
- URL: http://arxiv.org/abs/2103.05608v2
- Date: Thu, 11 Mar 2021 17:23:50 GMT
- Title: Dory: Overcoming Barriers to Computing Persistent Homology
- Authors: Manu Aggarwal and Vipul Periwal
- Abstract summary: We present Dory, an efficient and scalable algorithm that can compute the persistent homology of large data sets.
As an application, we compute the PH of the human genome at high resolution as revealed by a genome-wide Hi-C data set.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Persistent homology (PH) is an approach to topological data analysis (TDA)
that computes multi-scale topologically invariant properties of
high-dimensional data that are robust to noise. While PH has revealed useful
patterns across various applications, computational requirements have limited
applications to small data sets of a few thousand points. We present Dory, an
efficient and scalable algorithm that can compute the persistent homology of
large data sets. Dory uses significantly less memory than published algorithms
and also provides significant reductions in the computation time compared to
most algorithms. It scales to process data sets with millions of points. As an
application, we compute the PH of the human genome at high resolution as
revealed by a genome-wide Hi-C data set. Results show that the topology of the
human genome changes significantly upon treatment with auxin, a molecule that
degrades cohesin, corroborating the hypothesis that cohesin plays a crucial
role in loop formation in DNA.
Related papers
- A Scalable k-Medoids Clustering via Whale Optimization Algorithm [0.0]
We introduce WOA-kMedoids, a novel unsupervised clustering method that incorporates the Whale Optimization Algorithm (WOA)
By optimizing centroid selection, WOA-kMedoids reduces computational complexity of the k-medoids algorithm from quadratic to near-linear with respect to the number of observations.
We evaluated the performance of WOA-kMedoids on 25 diverse time series datasets from the UCR archive.
arXiv Detail & Related papers (2024-08-30T03:43:37Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - PLPCA: Persistent Laplacian Enhanced-PCA for Microarray Data Analysis [5.992724190105578]
We propose Persistent Laplacian-enhanced Principal Component Analysis (PLPCA)
PLPCA amalgamates the advantages of earlier regularized PCA methods with persistent spectral graph theory.
In contrast to graph Laplacians, persistent Laplacians enable multiscale analysis through filtration and incorporate higher-order simplicial complexes.
arXiv Detail & Related papers (2023-06-09T22:48:14Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Inference of Regulatory Networks Through Temporally Sparse Data [5.495223636885796]
A major goal in genomics is to properly capture the complex dynamical behaviors of gene regulatory networks (GRNs)
This paper develops a scalable and efficient topology inference for GRNs using Bayesian optimization and kernel-based methods.
arXiv Detail & Related papers (2022-07-21T22:48:12Z) - Tight basis cycle representatives for persistent homology of large data
sets [0.0]
Persistent homology (PH) is a popular tool for topological data analysis that has found applications across diverse areas of research.
Although powerful in theory, PH suffers from high computation cost that precludes its application to large data sets.
We provide a strategy and algorithms to compute tight representative boundaries around nontrivial robust features in large data sets.
arXiv Detail & Related papers (2022-06-06T22:00:42Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Multidimensional Scaling for Gene Sequence Data with Autoencoders [0.0]
We present an autoencoder-based dimensional reduction model which can easily scale to datasets containing millions of gene sequences.
The proposed model is evaluated against DAMDS with a real world fungi gene sequence dataset.
arXiv Detail & Related papers (2021-04-19T02:14:17Z) - Deep Representational Similarity Learning for analyzing neural
signatures in task-based fMRI dataset [81.02949933048332]
This paper develops Deep Representational Similarity Learning (DRSL), a deep extension of Representational Similarity Analysis (RSA)
DRSL is appropriate for analyzing similarities between various cognitive tasks in fMRI datasets with a large number of subjects.
arXiv Detail & Related papers (2020-09-28T18:30:14Z) - A Robust Functional EM Algorithm for Incomplete Panel Count Data [66.07942227228014]
We propose a functional EM algorithm to estimate the counting process mean function under a missing completely at random assumption (MCAR)
The proposed algorithm wraps several popular panel count inference methods, seamlessly deals with incomplete counts and is robust to misspecification of the Poisson process assumption.
We illustrate the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data.
arXiv Detail & Related papers (2020-03-02T20:04:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.