A systematic evaluation of methods for cell phenotype classification
using single-cell RNA sequencing data
- URL: http://arxiv.org/abs/2110.00681v1
- Date: Fri, 1 Oct 2021 23:24:15 GMT
- Title: A systematic evaluation of methods for cell phenotype classification
using single-cell RNA sequencing data
- Authors: Xiaowen Cao, Li Xing, Elham Majd, Hua He, Junhua Gu, Xuekui Zhang
- Abstract summary: This study evaluates 13 popular supervised machine learning algorithms to classify cell phenotypes.
The study outcomes showed that ElasticNet with interactions performed best in small and medium data sets.
- Score: 7.62849213621469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: Single-cell RNA sequencing (scRNA-seq) yields valuable insights
about gene expression and gives critical information about complex tissue
cellular composition. In the analysis of single-cell RNA sequencing, the
annotations of cell subtypes are often done manually, which is time-consuming
and irreproducible. Garnett is a cell-type annotation software based the on
elastic net method. Besides cell-type annotation, supervised machine learning
methods can also be applied to predict other cell phenotypes from genomic data.
Despite the popularity of such applications, there is no existing study to
systematically investigate the performance of those supervised algorithms in
various sizes of scRNA-seq data sets.
Methods and Results: This study evaluates 13 popular supervised machine
learning algorithms to classify cell phenotypes, using published real and
simulated data sets with diverse cell sizes. The benchmark contained two parts.
In the first part, we used real data sets to assess the popular supervised
algorithms' computing speed and cell phenotype classification performance. The
classification performances were evaluated using AUC statistics, F1-score,
precision, recall, and false-positive rate. In the second part, we evaluated
gene selection performance using published simulated data sets with a known
list of real genes.
Conclusion: The study outcomes showed that ElasticNet with interactions
performed best in small and medium data sets. NB was another appropriate method
for medium data sets. In large data sets, XGB works excellent. Ensemble
algorithms were not significantly superior to individual machine learning
methods. Adding interactions to ElasticNet can help, and the improvement was
significant in small data sets.
Related papers
- Lower-dimensional projections of cellular expression improves cell type classification from single-cell RNA sequencing [12.66369956714212]
Single-cell RNA sequencing (scRNA-seq) enables the study of cellular diversity at single cell level.
Various statistical, machine and deep learning-based methods have been proposed for cell-type classification.
In this work, we proposed a reference-based method for cell type classification, called EnProCell.
arXiv Detail & Related papers (2024-10-13T19:01:38Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Prompt-based Grouping Transformer for Nucleus Detection and
Classification [70.55961378096116]
nuclei detection and classification can produce effective information for disease diagnosis.
Most existing methods classify nuclei independently or do not make full use of the semantic similarity between nuclei and their grouping features.
We propose a novel end-to-end nuclei detection and classification framework based on a grouping transformer-based classifier.
arXiv Detail & Related papers (2023-10-22T04:50:48Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - Optirank: classification for RNA-Seq data with optimal ranking reference
genes [0.0]
We propose a logistic regression model, optirank, which learns simultaneously the parameters of the model and the genes to use as a reference set in the ranking.
We also consider real classification tasks, which present different kinds of distribution shifts between train and test data.
arXiv Detail & Related papers (2023-01-11T10:49:06Z) - CloudPred: Predicting Patient Phenotypes From Single-cell RNA-seq [6.669618903574761]
Single-cell RNA sequencing (scRNA-seq) has the potential to provide powerful, high-resolution signatures to inform disease prognosis and precision medicine.
This paper develops an interpretable machine learning algorithm, CloudPred, to predict individuals' disease phenotypes from their scRNA-seq data.
arXiv Detail & Related papers (2021-10-13T22:41:30Z) - Approximate kNN Classification for Biomedical Data [1.1852406625172218]
Single-cell RNA-seq (scRNA-seq) is an emerging DNA sequencing technology with promising capabilities but significant computational challenges.
We propose the utilization of approximate nearest neighbor search algorithms for the task of kNN classification in scRNA-seq data.
arXiv Detail & Related papers (2020-12-03T18:30:43Z) - Cell Type Identification from Single-Cell Transcriptomic Data via
Semi-supervised Learning [2.4271601178529063]
Cell type identification from single-cell transcriptomic data is a common goal of single-cell RNA sequencing (scRNAseq) data analysis.
We propose a semi-supervised learning model to use unlabeled scRNAseq cells and limited amount of labeled scRNAseq cells to implement cell identification.
It is observed that the proposed model is able to achieve encouraging performance by learning on very limited amount of labeled scRNAseq cells.
arXiv Detail & Related papers (2020-05-06T19:15:43Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.