Active feature selection discovers minimal gene-sets for classifying
cell-types and disease states in single-cell mRNA-seq data
- URL: http://arxiv.org/abs/2106.08317v1
- Date: Tue, 15 Jun 2021 17:49:26 GMT
- Title: Active feature selection discovers minimal gene-sets for classifying
cell-types and disease states in single-cell mRNA-seq data
- Authors: Xiaoqiao Chen, Sisi Chen, Matt Thomson
- Abstract summary: Single cell mRNA-seq costs currently prohibit the application of single cell mRNA-seq for many biological and clinical tasks of interest.
We introduce an active learning framework that constructs compressed gene sets that enable high accuracy classification of cell-types and physiological states.
The discovery of compact but highly informative gene sets might enable drastic reductions in sequencing requirements for applications of single-cell mRNA-seq.
- Score: 2.578242050187029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequencing costs currently prohibit the application of single cell mRNA-seq
for many biological and clinical tasks of interest. Here, we introduce an
active learning framework that constructs compressed gene sets that enable high
accuracy classification of cell-types and physiological states while analyzing
a minimal number of gene transcripts. Our active feature selection procedure
constructs gene sets through an iterative cell-type classification task where
misclassified cells are examined at each round to identify maximally
informative genes through an `active' support vector machine (SVM) classifier.
Our active SVM procedure automatically identifies gene sets that enables
$>90\%$ cell-type classification accuracy in the Tabula Muris mouse tissue
survey as well as a $\sim 40$ gene set that enables classification of multiple
myeloma patient samples with $>95\%$ accuracy. Broadly, the discovery of
compact but highly informative gene sets might enable drastic reductions in
sequencing requirements for applications of single-cell mRNA-seq.
Related papers
- An Evolutional Neural Network Framework for Classification of Microarray Data [0.0]
This research aims to apply a hybrid model of Genetic Algorithm and Neural Network to overcome the problem during subset selection of informative genes.
Experimental results show the proposed method suggested high accuracy and minimum number of selected genes in comparison with other machine learning algorithms.
arXiv Detail & Related papers (2024-11-20T13:48:40Z) - eDOC: Explainable Decoding Out-of-domain Cell Types with Evidential Learning [7.036161839497915]
Single-cell RNA-seq (scRNA-seq) technology is a powerful tool for unraveling the complexity of biological systems.
Cell Type CTA (CTA) is one of essential and fundamental tasks in scRNA-seq data analysis.
We develop a new method, eDOC, to address aforementioned challenges.
arXiv Detail & Related papers (2024-10-30T20:15:36Z) - MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Cell reprogramming design by transfer learning of functional
transcriptional networks [0.0]
We develop a transfer learning approach to control cell behavior that is pre-trained on transcriptomic data associated with human cell fates.
We show that the number of gene perturbations required to steer from one fate to another increases with decreasing developmental relatedness.
arXiv Detail & Related papers (2024-03-07T19:00:02Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - Fuzzy Gene Selection and Cancer Classification Based on Deep Learning
Model [1.3072222152900117]
We developed a new fuzzy gene selection technique (FGS) to identify informative genes to facilitate cancer classification.
With our FGS-enhanced method, the cancer classification model achieved 96.5%,96.2%,96%, and 95.9% for accuracy, precision, recall, and f1-score respectively.
In examining the six datasets that were used, the proposed model demonstrates it's capacity to classify cancer effectively.
arXiv Detail & Related papers (2023-05-04T21:52:57Z) - Multi-modal Self-supervised Pre-training for Regulatory Genome Across
Cell Types [75.65676405302105]
We propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT.
We pre-train our model on the ATAC-seq dataset with 17 million genome sequences.
arXiv Detail & Related papers (2021-10-11T12:48:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.