scCluBench: Comprehensive Benchmarking of Clustering Algorithms for Single-Cell RNA Sequencing
- URL: http://arxiv.org/abs/2512.02471v1
- Date: Tue, 02 Dec 2025 07:04:38 GMT
- Title: scCluBench: Comprehensive Benchmarking of Clustering Algorithms for Single-Cell RNA Sequencing
- Authors: Ping Xu, Zaitian Wang, Zhirui Wang, Pengjiang Li, Jiajia Wang, Ran Zhang, Pengfei Wang, Yuanchun Zhou,
- Abstract summary: scCluBench is a comprehensive benchmark of clustering algorithms for scRNA-seq data.<n>First, scCluBench provides 36 scRNA-seq datasets collected from diverse public sources.
- Score: 24.35296206096082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cell clustering is crucial for uncovering cellular heterogeneity in single-cell RNA sequencing (scRNA-seq) data by identifying cell types and marker genes. Despite its importance, benchmarks for scRNA-seq clustering methods remain fragmented, often lacking standardized protocols and failing to incorporate recent advances in artificial intelligence. To fill these gaps, we present scCluBench, a comprehensive benchmark of clustering algorithms for scRNA-seq data. First, scCluBench provides 36 scRNA-seq datasets collected from diverse public sources, covering multiple tissues, which are uniformly processed and standardized to ensure consistency for systematic evaluation and downstream analyses. To evaluate performance, we collect and reproduce a range of scRNA-seq clustering methods, including traditional, deep learning-based, graph-based, and biological foundation models. We comprehensively evaluate each method both quantitatively and qualitatively, using core performance metrics as well as visualization analyses. Furthermore, we construct representative downstream biological tasks, such as marker gene identification and cell type annotation, to further assess the practical utility. scCluBench then investigates the performance differences and applicability boundaries of various clustering models across diverse analytical tasks, systematically assessing their robustness and scalability in real-world scenarios. Overall, scCluBench offers a standardized and user-friendly benchmark for scRNA-seq clustering, with curated datasets, unified evaluation protocols, and transparent analyses, facilitating informed method selection and providing valuable insights into model generalizability and application scope.
Related papers
- Instance-Aware Robust Consistency Regularization for Semi-Supervised Nuclei Instance Segmentation [53.94176748542936]
We propose an Instance-Aware Robust Consistency Regularization Network (IRCR-Net) for accurate instance-level nuclei segmentation.<n>We incorporate morphological prior knowledge of nuclei in pathological images and utilize these priors to assess the quality of pseudo-labels generated from unlabeled data.
arXiv Detail & Related papers (2025-10-10T12:32:32Z) - scUnified: An AI-Ready Standardized Resource for Single-Cell RNA Sequencing Analysis [23.973638982075016]
We present scUnified, an AI-ready standardized resource for single-cell RNA sequencing data.<n> scUnified consolidates 13 high-quality datasets spanning two species and nine tissue types.
arXiv Detail & Related papers (2025-09-30T07:23:01Z) - JojoSCL: Shrinkage Contrastive Learning for single-cell RNA sequence Clustering [0.44116499009420784]
Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular processes by enabling gene expression analysis at the individual cell level.<n>However, the high dimensionality and sparsity of scRNA-seq data continue to challenge existing clustering models.<n>We introduce JojoSCL, a novel self-supervised contrastive learning framework for scRNA-seq clustering.
arXiv Detail & Related papers (2025-05-31T05:59:56Z) - scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data [5.234149080137045]
High sparsity and complex noise patterns inherent in scRNA-seq data present significant challenges for traditional clustering methods.
We propose a deep clustering method, Attention-Enhanced Structural Deep Embedding Graph Clustering (scASDC)
scASDC integrates multiple advanced modules to improve clustering accuracy and robustness.
arXiv Detail & Related papers (2024-08-09T09:10:36Z) - BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models).<n>First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications.<n>Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models.<n>Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - scRNA-seq Data Clustering by Cluster-aware Iterative Contrastive
Learning [29.199004624757233]
Single-cell RNA sequencing (scRNA-seq) enables researchers to analyze gene expression at single-cell level.
One important task in scRNA-seq data analysis is unsupervised clustering.
We propose Cluster-aware Iterative Contrastive Learning (CICL) for scRNA-seq data clustering.
arXiv Detail & Related papers (2023-12-27T14:50:59Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Single-cell Multi-view Clustering via Community Detection with Unknown
Number of Clusters [64.31109141089598]
We introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data.
scUNC seamlessly integrates information from different views without the need for a predefined number of clusters.
We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets.
arXiv Detail & Related papers (2023-11-28T08:34:58Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Review of Single-cell RNA-seq Data Clustering for Cell Type
Identification and Characterization [12.655970720359297]
Unsupervised learning has become the central component to identify and characterize novel cell types and gene expression patterns.
We review the existing single-cell RNA-seq data clustering methods with critical insights into the related advantages and limitations.
We conduct performance comparison experiments to evaluate several popular single-cell RNA-seq clustering approaches on two single-cell transcriptomic datasets.
arXiv Detail & Related papers (2020-01-03T22:48:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.