Isoform Function Prediction Using a Deep Neural Network
- URL: http://arxiv.org/abs/2208.03325v3
- Date: Tue, 25 Apr 2023 17:04:58 GMT
- Title: Isoform Function Prediction Using a Deep Neural Network
- Authors: Sara Ghazanfari, Ali Rasteh, Seyed Abolfazl Motahari, Mahdieh
Soleymani Baghshah
- Abstract summary: Studies have shown that more than 95% of human multi-exon genes have undergone alternative splicing.
Alternative splicing plays a significant role in human health and disease.
This project uses all Conditional data and valuable information such as mRNA sequences, expression profiles, and gene graphs.
- Score: 9.507435239304591
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Isoforms are mRNAs produced from the same gene site in the phenomenon called
Alternative Splicing. Studies have shown that more than 95% of human multi-exon
genes have undergone alternative splicing. Although there are few changes in
mRNA sequence, They may have a systematic effect on cell function and
regulation. It is widely reported that isoforms of a gene have distinct or even
contrasting functions. Most studies have shown that alternative splicing plays
a significant role in human health and disease. Despite the wide range of gene
function studies, there is little information about isoforms' functionalities.
Recently, some computational methods based on Multiple Instance Learning have
been proposed to predict isoform function using gene function and gene
expression profile. However, their performance is not desirable due to the lack
of labeled training data. In addition, probabilistic models such as Conditional
Random Field (CRF) have been used to model the relation between isoforms. This
project uses all the data and valuable information such as isoform sequences,
expression profiles, and gene ontology graphs and proposes a comprehensive
model based on Deep Neural Networks. The UniProt Gene Ontology (GO) database is
used as a standard reference for gene functions. The NCBI RefSeq database is
used for extracting gene and isoform sequences, and the NCBI SRA database is
used for expression profile data. Metrics such as Receiver Operating
Characteristic Area Under the Curve (ROC AUC) and Precision-Recall Under the
Curve (PR AUC) are used to measure the prediction accuracy.
Related papers
- Generating Multi-Modal and Multi-Attribute Single-Cell Counts with CFGen [76.02070962797794]
We present Cell Flow for Generation, a flow-based conditional generative model for multi-modal single-cell counts.
Our results suggest improved recovery of crucial biological data characteristics while accounting for novel generative tasks.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Genetic heterogeneity analysis using genetic algorithm and network
science [2.6166087473624318]
Genome-wide association studies (GWAS) can identify disease susceptible genetic variables.
Genetic variables intertwined with genetic effects often exhibit lower effect-size.
This paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet)
arXiv Detail & Related papers (2023-08-12T01:28:26Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Optirank: classification for RNA-Seq data with optimal ranking reference
genes [0.0]
We propose a logistic regression model, optirank, which learns simultaneously the parameters of the model and the genes to use as a reference set in the ranking.
We also consider real classification tasks, which present different kinds of distribution shifts between train and test data.
arXiv Detail & Related papers (2023-01-11T10:49:06Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Neural network facilitated ab initio derivation of linear formula: A
case study on formulating the relationship between DNA motifs and gene
expression [8.794181445664243]
We propose a framework for ab initio derivation of sequence motifs and linear formula using a new approach based on the interpretable neural network model.
We showed that this linear model could predict gene expression levels using promoter sequences with a performance comparable to deep neural network models.
arXiv Detail & Related papers (2022-08-19T22:29:30Z) - SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features
Learning from a Language Model [3.0643865202019698]
We propose a new solution named SemanticCAP to identify accessible regions of the genome.
It introduces a gene language model which models the context of gene sequences, thus being able to provide an effective representation of gene sequences.
Compared with other systems under public benchmarks, our model proved to have better performance.
arXiv Detail & Related papers (2022-04-05T11:47:58Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - A Semi-Supervised Generative Adversarial Network for Prediction of
Genetic Disease Outcomes [0.0]
We introduce genetic Generative Adversarial Networks (gGAN) to create large synthetic genetic data sets.
Our goal is to determine the propensity of a new individual to develop the severe form of the illness from their genetic profile alone.
The proposed model is self-aware and capable of determining whether a new genetic profile has enough compatibility with the data on which the network was trained.
arXiv Detail & Related papers (2020-07-02T15:35:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.