Assessing Concordance between RNA-Seq and NanoString Technologies in Ebola-Infected Nonhuman Primates Using Machine Learning
- URL: http://arxiv.org/abs/2410.23433v1
- Date: Wed, 30 Oct 2024 20:21:20 GMT
- Title: Assessing Concordance between RNA-Seq and NanoString Technologies in Ebola-Infected Nonhuman Primates Using Machine Learning
- Authors: Mostafa Rezapour, Aarthi Narayanan, Wyatt H. Mowery, Metin Nafi Gurcan,
- Abstract summary: We compare RNA sequencing (RNA-Seq) and NanoString technologies for gene expression analysis in non-human primates infected with Ebola virus (EBOV)
A machine learning approach, using the Supervised Magnitude-Altitude Scoring (SMAS) method trained on NanoString data, identified OAS1 as a key marker for distinguishing RT-qPCR positive from negative samples.
OAS1 also achieved 100% accuracy in differentiating infected from uninfected samples using logistic regression, demonstrating its robustness across platforms.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study evaluates the concordance between RNA sequencing (RNA-Seq) and NanoString technologies for gene expression analysis in non-human primates (NHPs) infected with Ebola virus (EBOV). We performed a detailed comparison of both platforms, demonstrating a strong correlation between them, with Spearman coefficients for 56 out of 62 samples ranging from 0.78 to 0.88, with a mean of 0.83 and a median of 0.85. Bland-Altman analysis further confirmed high consistency, with most measurements falling within 95% confidence limits. A machine learning approach, using the Supervised Magnitude-Altitude Scoring (SMAS) method trained on NanoString data, identified OAS1 as a key marker for distinguishing RT-qPCR positive from negative samples. Remarkably, when applied to RNA-Seq data, OAS1 also achieved 100% accuracy in differentiating infected from uninfected samples using logistic regression, demonstrating its robustness across platforms. Further differential expression analysis identified 12 common genes including ISG15, OAS1, IFI44, IFI27, IFIT2, IFIT3, IFI44L, MX1, MX2, OAS2, RSAD2, and OASL which demonstrated the highest levels of statistical significance and biological relevance across both platforms. Gene Ontology (GO) analysis confirmed that these genes are directly involved in key immune and viral infection pathways, reinforcing their importance in EBOV infection. In addition, RNA-Seq uniquely identified genes such as CASP5, USP18, and DDX60, which play key roles in immune regulation and antiviral defense. This finding highlights the broader detection capabilities of RNA-Seq and underscores the complementary strengths of both platforms in providing a comprehensive and accurate assessment of gene expression changes during Ebola virus infection.
Related papers
- GBDTSVM: Combined Support Vector Machine and Gradient Boosting Decision Tree Framework for efficient snoRNA-disease association prediction [0.0]
This paper proposes a model called 'GBDTSVM', representing a novel and efficient machine learning approach for predicting snoRNA-disease associations.<n>'GBDTSVM' effectively extracts integrated snoRNA-disease feature representations utilizing GBDT and SVM.<n> Experimental evaluation of the GBDTSVM model demonstrated superior performance compared to state-of-the-art methods in the field.
arXiv Detail & Related papers (2025-05-10T06:46:29Z) - scMamba: A Pre-Trained Model for Single-Nucleus RNA Sequencing Analysis in Neurodegenerative Disorders [43.24785083027205]
scMamba is a pre-trained model designed to improve the quality and utility of snRNA-seq analysis.
Inspired by the recent Mamba model, scMamba introduces a novel architecture that incorporates a linear adapter layer, gene embeddings, and bidirectional Mamba blocks.
We demonstrate that scMamba outperforms benchmark methods in various downstream tasks, including cell type annotation, doublet detection, imputation, and the identification of differentially expressed genes.
arXiv Detail & Related papers (2025-02-12T11:48:22Z) - Graph Structure Learning for Tumor Microenvironment with Cell Type Annotation from non-spatial scRNA-seq data [6.432270457083369]
We present a novel graph neural network (GNN) model that enhances cell type prediction and cell interaction analysis.
The proposed scGSL model demonstrated robust performance, achieving an average accuracy of 84.83%, precision of 86.23%, recall of 81.51%, and an F1 score of 80.92% across all datasets.
arXiv Detail & Related papers (2025-02-04T18:28:25Z) - SGUQ: Staged Graph Convolution Neural Network for Alzheimer's Disease Diagnosis using Multi-Omics Data [7.090283934070421]
Alzheimer's disease (AD) is a chronic neurodegenerative disorder and the leading cause of dementia.
Conventional approaches typically require the completion of all omics data at the outset to achieve optimal AD diagnosis.
We propose a novel staged graph convolutional network with uncertainty quantification (SGUQ)
arXiv Detail & Related papers (2024-10-14T19:51:32Z) - Pan-cancer gene set discovery via scRNA-seq for optimal deep learning based downstream tasks [6.869831177092736]
We analyzed scRNA-seq data from 181 tumor biopsies across 13 cancer types.
High-dimensional weighted gene co-expression network analysis (hdWGCNA) was performed to identify relevant gene sets.
Oncogenes from OncoKB evaluated with deep learning models, including multilayer perceptrons (MLPs) and graph neural networks (GNNs)
arXiv Detail & Related papers (2024-08-13T23:24:36Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Machine Learning-Based Analysis of Ebola Virus' Impact on Gene
Expression in Nonhuman Primates [3.842863644161241]
This study introduces the Supervised Magnitude-Altitude Scoring (SMAS) methodology, a machine learning-based approach, for analyzing gene expression data obtained from nonhuman primates (NHPs) infected with Ebola virus (EBOV)
We utilize a comprehensive dataset of NanoString gene expression profiles from Ebola-infected NHPs, deploying the SMAS system for nuanced host-pathogen interaction analysis.
Key finding of our research is the identification of IFI6 and IFI27 as critical biomarkers, demonstrating exceptional predictive performance with 100% accuracy and Area Under the Curve (AUC) metrics in classifying various stages of
arXiv Detail & Related papers (2024-01-16T18:31:23Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution [76.97231739317259]
We present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level.
On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data.
arXiv Detail & Related papers (2023-06-27T20:46:34Z) - hist2RNA: An efficient deep learning architecture to predict gene
expression from breast cancer histopathology images [11.822321981275232]
Deep learning algorithms can effectively extract morphological patterns in digital histopathology images to predict molecular phenotypes quickly and cost-effectively.
We propose a new, computationally efficient approach called hist2RNA inspired by bulk RNA-sequencing techniques to predict the expression of 138 genes.
arXiv Detail & Related papers (2023-04-10T10:54:32Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Using Deep Learning Sequence Models to Identify SARS-CoV-2 Divergence [1.9573380763700707]
SARS-CoV-2 is an upper respiratory system RNA virus that has caused over 3 million deaths and infecting over 150 million worldwide as of May 2021.
We propose a neural network model that leverages recurrent and convolutional units to take in amino acid sequences of spike proteins and classify corresponding clades.
arXiv Detail & Related papers (2021-11-12T07:52:11Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors
and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic.
The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands.
We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.