Predicting Molecular Phenotypes with Single Cell RNA Sequencing Data: an
Assessment of Unsupervised Machine Learning Models
- URL: http://arxiv.org/abs/2108.05039v1
- Date: Wed, 11 Aug 2021 05:30:37 GMT
- Title: Predicting Molecular Phenotypes with Single Cell RNA Sequencing Data: an
Assessment of Unsupervised Machine Learning Models
- Authors: Anastasia Dunca, Frederick R. Adler
- Abstract summary: This study is to evaluate unsupervised machine learning on classifying treatment-resistant phenotypes in heterogeneous tumors.
scRNAseq quantifies mRNA in cells and characterizes cell phenotypes.
clusters generated from this pipeline can be used to understand cancer cell behavior and malignant growth.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: According to the National Cancer Institute, there were 9.5 million
cancer-related deaths in 2018. A challenge in improving treatment is resistance
in genetically unstable cells. The purpose of this study is to evaluate
unsupervised machine learning on classifying treatment-resistant phenotypes in
heterogeneous tumors through analysis of single cell RNA sequencing(scRNAseq)
data with a pipeline and evaluation metrics. scRNAseq quantifies mRNA in cells
and characterizes cell phenotypes. One scRNAseq dataset was analyzed
(tumor/non-tumor cells of different molecular subtypes and patient
identifications). The pipeline consisted of data filtering, dimensionality
reduction with Principal Component Analysis, projection with Uniform Manifold
Approximation and Projection, clustering with nine approaches (Ward, BIRCH,
Gaussian Mixture Model, DBSCAN, Spectral, Affinity Propagation, Agglomerative
Clustering, Mean Shift, and K-Means), and evaluation. Seven models divided
tumor versus non-tumor cells and molecular subtype while six models classified
different patient identification (13 of which were presented in the dataset);
K-Means, Ward, and BIRCH often ranked highest with ~80% accuracy on the tumor
versus non-tumor task and ~60% for molecular subtype and patient ID. An
optimized classification pipeline using K-Means, Ward, and BIRCH models was
evaluated to be most effective for further analysis. In clinical research where
there is currently no standard protocol for scRNAseq analysis, clusters
generated from this pipeline can be used to understand cancer cell behavior and
malignant growth, directly affecting the success of treatment.
Related papers
- Pan-cancer gene set discovery via scRNA-seq for optimal deep learning based downstream tasks [6.869831177092736]
We analyzed scRNA-seq data from 181 tumor biopsies across 13 cancer types.
High-dimensional weighted gene co-expression network analysis (hdWGCNA) was performed to identify relevant gene sets.
Oncogenes from OncoKB evaluated with deep learning models, including multilayer perceptrons (MLPs) and graph neural networks (GNNs)
arXiv Detail & Related papers (2024-08-13T23:24:36Z) - MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - Self-Normalizing Foundation Model for Enhanced Multi-Omics Data Analysis in Oncology [0.0]
SeNMo is a foundation model that has been trained on multi-omics data across 33 cancer types.
We trained SeNMo for the task of overall survival of patients using pan-cancer multi-omics data involving 33 cancer sites.
SeNMo was validated on two independent cohorts: Moffitt Cancer Center and CPTAC lung squamous cell carcinoma.
arXiv Detail & Related papers (2024-05-13T22:45:44Z) - FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking [1.6712896227173808]
FlowCyt is the first comprehensive benchmark for multi-class single-cell classification in flowencoded data.
The dataset comprises bone marrow samples from 30 patients, with each cell characterized by twelve markers.
arXiv Detail & Related papers (2024-02-28T15:01:59Z) - hist2RNA: An efficient deep learning architecture to predict gene
expression from breast cancer histopathology images [11.822321981275232]
Deep learning algorithms can effectively extract morphological patterns in digital histopathology images to predict molecular phenotypes quickly and cost-effectively.
We propose a new, computationally efficient approach called hist2RNA inspired by bulk RNA-sequencing techniques to predict the expression of 138 genes.
arXiv Detail & Related papers (2023-04-10T10:54:32Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z) - Multi-scale Deep Learning Architecture for Nucleus Detection in Renal
Cell Carcinoma Microscopy Image [7.437224586066945]
Clear cell renal cell carcinoma (ccRCC) is one of the most common forms of intratumoral heterogeneity in the study of renal cancer.
In this paper, we introduce a deep learning-based detection model for cell classification on IHC stained histology images.
Our model maps the multi-scale pyramid features and saliency information from local bounded regions and predicts the bounding box coordinates through regression.
arXiv Detail & Related papers (2021-04-28T03:36:02Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.