Related papers: Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data

Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data

URL: http://arxiv.org/abs/2509.15429v1
Date: Thu, 18 Sep 2025 21:08:38 GMT
Title: Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data
Authors: Victor Chardès,
Abstract summary: Single-cell RNA-seq provides detailed molecular snapshots of individual cells.<n>Most studies still rely on principal component analysis (PCA) for dimensionality reduction.<n>We improve upon PCA with a Random Matrix Theory (RMT)-based approach that guides the inference of sparse principal components.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Single-cell RNA-seq provides detailed molecular snapshots of individual cells but is notoriously noisy. Variability stems from biological differences, PCR amplification bias, limited sequencing depth, and low capture efficiency, making it challenging to adapt computational pipelines to heterogeneous datasets or evolving technologies. As a result, most studies still rely on principal component analysis (PCA) for dimensionality reduction, valued for its interpretability and robustness. Here, we improve upon PCA with a Random Matrix Theory (RMT)-based approach that guides the inference of sparse principal components using existing sparse PCA algorithms. We first introduce a novel biwhitening method, inspired by the Sinkhorn-Knopp algorithm, that simultaneously stabilizes variance across genes and cells. This enables the use of an RMT-based criterion to automatically select the sparsity level, rendering sparse PCA nearly parameter-free. Our mathematically grounded approach retains the interpretability of PCA while enabling robust, hands-off inference of sparse principal components. Across seven single-cell RNA-seq technologies and four sparse PCA algorithms, we show that this method systematically improves the reconstruction of the principal subspace and consistently outperforms PCA-, autoencoder-, and diffusion-based methods in cell-type classification tasks.

Related papers

Non-linear PCA via Evolution Strategies: a Novel Objective Function [0.0]
We propose a robust non-linear framework that unifies the interpretability of PCA with the flexibility of neural networks.<n>Our method parametrizes variable transformations via neural networks, optimized using Evolution Strategies (ES) to handle the non-differentiability of eigendecomposition.<n>We demonstrate that our method significantly outperforms both linear and kPCA in explained variance across synthetic and real-world datasets.
arXiv Detail & Related papers (2026-02-03T19:34:37Z)
Escaping Local Optima in the Waddington Landscape: A Multi-Stage TRPO-PPO Approach for Single-Cell Perturbation Analysis [0.0]
We introduce a multi-stage learning reinforcement algorithm for single-cell perturbation policy modeling.<n>We first update an elusive natural perturbationvector and a conjugate KLPO trust solver to provide a safe, first step for policy modeling.
arXiv Detail & Related papers (2025-10-14T22:20:56Z)
Solve sparse PCA problem by employing Hamiltonian system and leapfrog method [0.0]
We propose a novel sparse PCA algorithm that imposes sparsity through a smooth L1 penalty.<n> Experimental evaluations on a face recognition dataset-using both k-nearest neighbor and kernel ridge regressions-demonstrate that the proposed sparse PCA methods consistently achieve higher classification accuracy than conventional PCA.
arXiv Detail & Related papers (2025-03-30T06:39:11Z)
K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis [0.3683202928838613]
We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L$_2,1$ norm regularization. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse scRNA-seq datasets.
arXiv Detail & Related papers (2023-10-23T03:07:50Z)
Improved Privacy-Preserving PCA Using Optimized Homomorphic Matrix Multiplication [0.0]
Principal Component Analysis (PCA) is a pivotal technique widely utilized in the realms of machine learning and data analysis. In recent years, there have been endeavors to utilize homomorphic encryption in privacy-preserving PCA algorithms for the secure cloud computing scenario. We propose a novel approach to privacy-preserving PCA that addresses these limitations, resulting in superior efficiency, accuracy, and scalability compared to previous approaches.
arXiv Detail & Related papers (2023-05-27T02:51:20Z)
Local manifold learning and its link to domain-based physics knowledge [53.15471241298841]
In many reacting flow systems, the thermo-chemical state-space is assumed to evolve close to a low-dimensional manifold (LDM) We show that PCA applied in local clusters of data (local PCA) is capable of detecting the intrinsic parameterization of the thermo-chemical state-space.
arXiv Detail & Related papers (2022-07-01T09:06:25Z)
Capturing the Denoising Effect of PCA via Compression Ratio [3.967854215226183]
Principal component analysis (PCA) is one of the most fundamental tools in machine learning. In this paper, we propose a novel metric called emphcompression ratio to capture the effect of PCA on high-dimensional noisy data. Building on this new metric, we design a straightforward algorithm that could be used to detect outliers.
arXiv Detail & Related papers (2022-04-22T18:43:47Z)
Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E) We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z)
AgFlow: Fast Model Selection of Penalized PCA via Implicit Regularization Effects of Gradient Flow [64.81110234990888]
Principal component analysis (PCA) has been widely used as an effective technique for feature extraction and dimension reduction. In the High Dimension Low Sample Size (HDLSS) setting, one may prefer modified principal components, with penalized loadings. We propose Approximated Gradient Flow (AgFlow) as a fast model selection method for penalized PCA.
arXiv Detail & Related papers (2021-10-07T08:57:46Z)
Turning Channel Noise into an Accelerator for Over-the-Air Principal Component Analysis [65.31074639627226]
Principal component analysis (PCA) is a technique for extracting the linear structure of a dataset. We propose the deployment of PCA over a multi-access channel based on the algorithm of gradient descent. Over-the-air aggregation is adopted to reduce the multi-access latency, giving the name over-the-air PCA.
arXiv Detail & Related papers (2021-04-20T16:28:33Z)
Investigating the Scalability and Biological Plausibility of the Activation Relaxation Algorithm [62.997667081978825]
Activation Relaxation (AR) algorithm provides a simple and robust approach for approximating the backpropagation of error algorithm. We show that the algorithm can be further simplified and made more biologically plausible by introducing a learnable set of backwards weights. We also investigate whether another biologically implausible assumption of the original AR algorithm -- the frozen feedforward pass -- can be relaxed without damaging performance.
arXiv Detail & Related papers (2020-10-13T08:02:38Z)
Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics. Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis. We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.