scRNA-seq Data Clustering by Cluster-aware Iterative Contrastive
Learning
- URL: http://arxiv.org/abs/2312.16600v1
- Date: Wed, 27 Dec 2023 14:50:59 GMT
- Title: scRNA-seq Data Clustering by Cluster-aware Iterative Contrastive
Learning
- Authors: Weikang Jiang, Jinxian Wang, Jihong Guan and Shuigeng Zhou
- Abstract summary: Single-cell RNA sequencing (scRNA-seq) enables researchers to analyze gene expression at single-cell level.
One important task in scRNA-seq data analysis is unsupervised clustering.
We propose Cluster-aware Iterative Contrastive Learning (CICL) for scRNA-seq data clustering.
- Score: 29.199004624757233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-cell RNA sequencing (scRNA-seq) enables researchers to analyze gene
expression at single-cell level. One important task in scRNA-seq data analysis
is unsupervised clustering, which helps identify distinct cell types, laying
down the foundation for other downstream analysis tasks. In this paper, we
propose a novel method called Cluster-aware Iterative Contrastive Learning
(CICL in short) for scRNA-seq data clustering, which utilizes an iterative
representation learning and clustering framework to progressively learn the
clustering structure of scRNA-seq data with a cluster-aware contrastive loss.
CICL consists of a Transformer encoder, a clustering head, a projection head
and a contrastive loss module. First, CICL extracts the feature vectors of the
original and augmented data by the Transformer encoder. Then, it computes the
clustering centroids by K-means and employs the student t-distribution to
assign pseudo-labels to all cells in the clustering head. The projection-head
uses a Multi-Layer Perceptron (MLP) to obtain projections of the augmented
data. At last, both pseudo-labels and projections are used in the contrastive
loss to guide the model training. Such a process goes iteratively so that the
clustering result becomes better and better. Extensive experiments on 25 real
world scRNA-seq datasets show that CICL outperforms the SOTA methods.
Concretely, CICL surpasses the existing methods by from 14% to 280%, and from
5% to 133% on average in terms of performance metrics ARI and NMI respectively.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data [5.234149080137045]
High sparsity and complex noise patterns inherent in scRNA-seq data present significant challenges for traditional clustering methods.
We propose a deep clustering method, Attention-Enhanced Structural Deep Embedding Graph Clustering (scASDC)
scASDC integrates multiple advanced modules to improve clustering accuracy and robustness.
arXiv Detail & Related papers (2024-08-09T09:10:36Z) - scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding [12.996418312603284]
scCDCG (single-cell RNA-seq Clustering via Deep Cut-informed Graph) is a novel framework designed for efficient and accurate clustering of scRNA-seq data.
scCDCG comprises three main components: (i) A graph embedding module utilizing deep cut-informed techniques, which effectively captures intercellular high-order structural information.
(ii) A self-supervised learning module guided by optimal transport, tailored to accommodate the unique complexities of scRNA-seq data.
arXiv Detail & Related papers (2024-04-09T09:46:17Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - Single-cell Multi-view Clustering via Community Detection with Unknown
Number of Clusters [64.31109141089598]
We introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data.
scUNC seamlessly integrates information from different views without the need for a predefined number of clusters.
We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets.
arXiv Detail & Related papers (2023-11-28T08:34:58Z) - Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE [0.0]
Correlated clustering and projection (CCP) was introduced as an effective method for preprocessing scRNA-seq data.
CCP is a data-domain approach that does not require matrix diagonalization.
By using eight publicly available datasets, we have found that CCP significantly improves UMAP and t-SNE visualization.
arXiv Detail & Related papers (2023-06-23T19:15:43Z) - Confident Clustering via PCA Compression Ratio and Its Application to
Single-cell RNA-seq Analysis [4.511561231517167]
We develop a confident clustering method to diminish the influence of boundary datapoints.
We validate our algorithm on single-cell RNA-seq data.
Unlike traditional clustering methods in single-cell analysis, the confident clustering shows high stability under different choices of parameters.
arXiv Detail & Related papers (2022-05-19T20:46:49Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Contrastive Clustering [57.71729650297379]
We propose Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning.
In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19% (39%) performance improvement compared with the best baseline.
arXiv Detail & Related papers (2020-09-21T08:54:40Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z) - Review of Single-cell RNA-seq Data Clustering for Cell Type
Identification and Characterization [12.655970720359297]
Unsupervised learning has become the central component to identify and characterize novel cell types and gene expression patterns.
We review the existing single-cell RNA-seq data clustering methods with critical insights into the related advantages and limitations.
We conduct performance comparison experiments to evaluate several popular single-cell RNA-seq clustering approaches on two single-cell transcriptomic datasets.
arXiv Detail & Related papers (2020-01-03T22:48:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.