Cell Type Identification from Single-Cell Transcriptomic Data via
Semi-supervised Learning
- URL: http://arxiv.org/abs/2005.03994v1
- Date: Wed, 6 May 2020 19:15:43 GMT
- Title: Cell Type Identification from Single-Cell Transcriptomic Data via
Semi-supervised Learning
- Authors: Xishuang Dong, Shanta Chowdhury, Uboho Victor, Xiangfang Li, Lijun
Qian
- Abstract summary: Cell type identification from single-cell transcriptomic data is a common goal of single-cell RNA sequencing (scRNAseq) data analysis.
We propose a semi-supervised learning model to use unlabeled scRNAseq cells and limited amount of labeled scRNAseq cells to implement cell identification.
It is observed that the proposed model is able to achieve encouraging performance by learning on very limited amount of labeled scRNAseq cells.
- Score: 2.4271601178529063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cell type identification from single-cell transcriptomic data is a common
goal of single-cell RNA sequencing (scRNAseq) data analysis. Neural networks
have been employed to identify cell types from scRNAseq data with high
performance. However, it requires a large mount of individual cells with
accurate and unbiased annotated types to build the identification models.
Unfortunately, labeling the scRNAseq data is cumbersome and time-consuming as
it involves manual inspection of marker genes. To overcome this challenge, we
propose a semi-supervised learning model to use unlabeled scRNAseq cells and
limited amount of labeled scRNAseq cells to implement cell identification.
Firstly, we transform the scRNAseq cells to "gene sentences", which is inspired
by similarities between natural language system and gene system. Then genes in
these sentences are represented as gene embeddings to reduce data sparsity.
With these embeddings, we implement a semi-supervised learning model based on
recurrent convolutional neural networks (RCNN), which includes a shared
network, a supervised network and an unsupervised network. The proposed model
is evaluated on macosko2015, a large scale single-cell transcriptomic dataset
with ground truth of individual cell types. It is observed that the proposed
model is able to achieve encouraging performance by learning on very limited
amount of labeled scRNAseq cells together with a large number of unlabeled
scRNAseq cells.
Related papers
- MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - N-ACT: An Interpretable Deep Learning Model for Automatic Cell Type and
Salient Gene Identification [0.0]
A major limitation in most scRNAseq analysis pipelines is the reliance on manual annotations to determine cell identities.
N-ACT is the first-of-its-kind interpretable deep neural network for ACTI utilizing neural-attention to detect salient genes for use in cell-type identification.
arXiv Detail & Related papers (2022-05-08T18:13:28Z) - CloudPred: Predicting Patient Phenotypes From Single-cell RNA-seq [6.669618903574761]
Single-cell RNA sequencing (scRNA-seq) has the potential to provide powerful, high-resolution signatures to inform disease prognosis and precision medicine.
This paper develops an interpretable machine learning algorithm, CloudPred, to predict individuals' disease phenotypes from their scRNA-seq data.
arXiv Detail & Related papers (2021-10-13T22:41:30Z) - Multi-modal Self-supervised Pre-training for Regulatory Genome Across
Cell Types [75.65676405302105]
We propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT.
We pre-train our model on the ATAC-seq dataset with 17 million genome sequences.
arXiv Detail & Related papers (2021-10-11T12:48:44Z) - A systematic evaluation of methods for cell phenotype classification
using single-cell RNA sequencing data [7.62849213621469]
This study evaluates 13 popular supervised machine learning algorithms to classify cell phenotypes.
The study outcomes showed that ElasticNet with interactions performed best in small and medium data sets.
arXiv Detail & Related papers (2021-10-01T23:24:15Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.