Related papers: K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis

K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis

URL: http://arxiv.org/abs/2310.14521v1
Date: Mon, 23 Oct 2023 03:07:50 GMT
Title: K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis
Authors: Sean Cottrell, Yuta Hozumi, Guo-Wei Wei
Abstract summary: We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L$_2,1$ norm regularization. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse scRNA-seq datasets.
Score: 0.3683202928838613
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L$_{2,1}$ norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins.

Related papers

ReDiSC: A Reparameterized Masked Diffusion Model for Scalable Node Classification with Structured Predictions [64.17845687013434]
We propose ReDiSC, a structured diffusion model for structured node classification.<n>We show that ReDiSC achieves superior or highly competitive performance compared to state-of-the-art GNN, label propagation, and diffusion-based baselines.<n> Notably, ReDiSC scales effectively to large-scale datasets on which previous structured diffusion methods fail due to computational constraints.
arXiv Detail & Related papers (2025-07-19T04:46:53Z)
Solve sparse PCA problem by employing Hamiltonian system and leapfrog method [0.0]
We propose a novel sparse PCA algorithm that imposes sparsity through a smooth L1 penalty. Experimental evaluations on a face recognition dataset-using both k-nearest neighbor and kernel ridge regressions-demonstrate that the proposed sparse PCA methods consistently achieve higher classification accuracy than conventional PCA.
arXiv Detail & Related papers (2025-03-30T06:39:11Z)
Graph Canonical Correlation Analysis [2.588462392029118]
Canonical correlation analysis (CCA) is a widely used technique for estimating associations between two sets of variables. Recent advancements in CCA methods have expanded their application to decipher the interactions of multiomics datasets. We propose the graph Canonical Correlation Analysis (gCCA) approach, which calculates canonical correlations based on the graph structure of the cross-correlation matrix.
arXiv Detail & Related papers (2025-02-03T19:41:06Z)
Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment [59.75420353684495]
Machine learning applications on signals such as computer vision or biomedical data often face challenges due to the variability that exists across hardware devices or session recordings. In this work, we propose Spatio-Temporal Monge Alignment (STMA) to mitigate these variabilities. We show that STMA leads to significant and consistent performance gains between datasets acquired with very different settings.
arXiv Detail & Related papers (2024-07-19T13:33:38Z)
Injecting Hierarchical Biological Priors into Graph Neural Networks for Flow Cytometry Prediction [1.7709249262395883]
This work explores injecting hierarchical prior knowledge into graph neural networks (GNNs) for single-cell multi-class classification of cellular data. We propose our hierarchical plug-in method to be applied to several GNN models, namely, FCHC-GNN, and effectively designed to capture neighborhood information crucial for single-cell FC domain.
arXiv Detail & Related papers (2024-05-28T18:24:16Z)
Domain Adaptive and Fine-grained Anomaly Detection for Single-cell Sequencing Data and Beyond [4.4136780724044735]
We present ACSleuth, a novel, reconstruction deviation-guided generative framework that integrates the detection, domain adaptation, and fine-grained annotating of anomalous cells into a methodologically cohesive workflow. This analysis informs us to develop a novel and superior maximum mean discrepancy-based anomaly scorer in ACSleuth.
arXiv Detail & Related papers (2024-04-26T14:48:24Z)
Weakly supervised covariance matrices alignment through Stiefel matrices estimation for MEG applications [64.20396555814513]
This paper introduces a novel domain adaptation technique for time series data, called Mixing model Stiefel Adaptation (MSA) We exploit abundant unlabeled data in the target domain to ensure effective prediction by establishing pairwise correspondence with equivalent signal variances between domains. MSA outperforms recent methods in brain-age regression with task variations using magnetoencephalography (MEG) signals from the Cam-CAN dataset.
arXiv Detail & Related papers (2024-01-24T19:04:49Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE [0.0]
Correlated clustering and projection (CCP) was introduced as an effective method for preprocessing scRNA-seq data. CCP is a data-domain approach that does not require matrix diagonalization. By using eight publicly available datasets, we have found that CCP significantly improves UMAP and t-SNE visualization.
arXiv Detail & Related papers (2023-06-23T19:15:43Z)
PLPCA: Persistent Laplacian Enhanced-PCA for Microarray Data Analysis [5.992724190105578]
We propose Persistent Laplacian-enhanced Principal Component Analysis (PLPCA) PLPCA amalgamates the advantages of earlier regularized PCA methods with persistent spectral graph theory. In contrast to graph Laplacians, persistent Laplacians enable multiscale analysis through filtration and incorporate higher-order simplicial complexes.
arXiv Detail & Related papers (2023-06-09T22:48:14Z)
Batch Normalization in Cytometry Data by kNN-Graph Preservation [0.0]
Batch effects in high-dimensional Cytometry by Time-of-Flight (CyTOF) data pose a challenge for comparative analysis.<n>Traditional batch normalization methods may fail to preserve the complex topological structures inherent in cellular populations.<n>We present a residual neural network-based method for point set registration specifically tailored to address batch normalization in CyTOF data.
arXiv Detail & Related papers (2023-03-31T18:06:26Z)
coVariance Neural Networks [119.45320143101381]
Graph neural networks (GNN) are an effective framework that exploit inter-relationships within graph-structured data for learning. We propose a GNN architecture, called coVariance neural network (VNN), that operates on sample covariance matrices as graphs. We show that VNN performance is indeed more stable than PCA-based statistical approaches.
arXiv Detail & Related papers (2022-05-31T15:04:43Z)
Approximate kNN Classification for Biomedical Data [1.1852406625172218]
Single-cell RNA-seq (scRNA-seq) is an emerging DNA sequencing technology with promising capabilities but significant computational challenges. We propose the utilization of approximate nearest neighbor search algorithms for the task of kNN classification in scRNA-seq data.
arXiv Detail & Related papers (2020-12-03T18:30:43Z)
A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques. We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees. Our approach leads to significantly better bounds for datasets with known rates of singular value decay. We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.