Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE
- URL: http://arxiv.org/abs/2306.13750v1
- Date: Fri, 23 Jun 2023 19:15:43 GMT
- Title: Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE
- Authors: Yuta Hozumi, Gu-Wei Wei
- Abstract summary: Correlated clustering and projection (CCP) was introduced as an effective method for preprocessing scRNA-seq data.
CCP is a data-domain approach that does not require matrix diagonalization.
By using eight publicly available datasets, we have found that CCP significantly improves UMAP and t-SNE visualization.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity
in cells, which has given us insights into cell-cell communication, cell
differentiation, and differential gene expression. However, analyzing scRNA-seq
data is a challenge due to sparsity and the large number of genes involved.
Therefore, dimensionality reduction and feature selection are important for
removing spurious signals and enhancing downstream analysis. Correlated
clustering and projection (CCP) was recently introduced as an effective method
for preprocessing scRNA-seq data. CCP utilizes gene-gene correlations to
partition the genes and, based on the partition, employs cell-cell interactions
to obtain super-genes. Because CCP is a data-domain approach that does not
require matrix diagonalization, it can be used in many downstream machine
learning tasks. In this work, we utilize CCP as an initialization tool for
uniform manifold approximation and projection (UMAP) and t-distributed
stochastic neighbor embedding (t-SNE). By using eight publicly available
datasets, we have found that CCP significantly improves UMAP and t-SNE
visualization and dramatically improve their accuracy.
Related papers
- Nearest Neighbor CCP-Based Molecular Sequence Analysis [4.199844472131922]
Correlated Clustering and Projection (CCP) has been proposed as an effective method for biological sequencing data.
We present a Nearest Neighbor Correlated Clustering and Projection (CCP-NN)-based technique for efficiently preprocessing molecular sequence data.
Our findings show that CCP-NN considerably improves classification task accuracy as well as significantly outperforms CCP in terms of computational runtime.
arXiv Detail & Related papers (2024-09-07T22:06:00Z) - scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data [5.234149080137045]
High sparsity and complex noise patterns inherent in scRNA-seq data present significant challenges for traditional clustering methods.
We propose a deep clustering method, Attention-Enhanced Structural Deep Embedding Graph Clustering (scASDC)
scASDC integrates multiple advanced modules to improve clustering accuracy and robustness.
arXiv Detail & Related papers (2024-08-09T09:10:36Z) - scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding [12.996418312603284]
scCDCG (single-cell RNA-seq Clustering via Deep Cut-informed Graph) is a novel framework designed for efficient and accurate clustering of scRNA-seq data.
scCDCG comprises three main components: (i) A graph embedding module utilizing deep cut-informed techniques, which effectively captures intercellular high-order structural information.
(ii) A self-supervised learning module guided by optimal transport, tailored to accommodate the unique complexities of scRNA-seq data.
arXiv Detail & Related papers (2024-04-09T09:46:17Z) - Cell Graph Transformer for Nuclei Classification [78.47566396839628]
We develop a cell graph transformer (CGT) that treats nodes and edges as input tokens to enable learnable adjacency and information exchange among all nodes.
Poorly features can lead to noisy self-attention scores and inferior convergence.
We propose a novel topology-aware pretraining method that leverages a graph convolutional network (GCN) to learn a feature extractor.
arXiv Detail & Related papers (2024-02-20T12:01:30Z) - scRNA-seq Data Clustering by Cluster-aware Iterative Contrastive
Learning [29.199004624757233]
Single-cell RNA sequencing (scRNA-seq) enables researchers to analyze gene expression at single-cell level.
One important task in scRNA-seq data analysis is unsupervised clustering.
We propose Cluster-aware Iterative Contrastive Learning (CICL) for scRNA-seq data clustering.
arXiv Detail & Related papers (2023-12-27T14:50:59Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data
Analysis [0.3683202928838613]
We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L$_2,1$ norm regularization.
We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method.
We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse scRNA-seq datasets.
arXiv Detail & Related papers (2023-10-23T03:07:50Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.