Predicting Hydroxyl Mediated Nucleophilic Degradation and Molecular
Stability of RNA Sequences through the Application of Deep Learning Methods
- URL: http://arxiv.org/abs/2011.05136v3
- Date: Sun, 26 Sep 2021 16:42:24 GMT
- Title: Predicting Hydroxyl Mediated Nucleophilic Degradation and Molecular
Stability of RNA Sequences through the Application of Deep Learning Methods
- Authors: Ankit Singhal
- Abstract summary: This paper proposes and evaluates three deep learning models as methods to predict the reactivity and risk of degradation of mRNA sequences.
The Stanford Open Vaccine dataset of 6034 mRNA sequences was used in this study.
Results suggest these models can be applied to understand and predict the chemical stability of mRNA in the near future.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthesis and efficient implementation mRNA strands has been shown to have
wide utility, especially recently in the development of COVID vaccines.
However, the intrinsic chemical stability of mRNA poses a challenge due to the
presence of 2'-hydroxyl groups in ribose sugars. The -OH group in the backbone
structure enables a base-catalyzed nucleophilic attack by the deprotonated
hydroxyl on the adjacent phosphorous and consequent self-hydrolysis of the
phosphodiester bond. As expected for in-line hydrolytic cleavage reactions, the
chemical stability of mRNA strands is highly dependent on external
environmental factors, e.g. pH, temperature, oxidizers, etc. Predicting this
chemical instability using a computational model will reduce the number of
sequences synthesized and tested through identifying the most promising
candidates, aiding the development of mRNA related therapies. This paper
proposes and evaluates three deep learning models (Long Short Term Memory,
Gated Recurrent Unit, and Graph Convolutional Networks) as methods to predict
the reactivity and risk of degradation of mRNA sequences. The Stanford Open
Vaccine dataset of 6034 mRNA sequences was used in this study. The training set
consisted of 3029 of these sequences (length of 107 nucleotide bases) while the
testing dataset consisted of 3005 sequences (length of 130 nucleotide bases),
in structured (Lowest Entropy Base Pair Probability Matrix) and unstructured
(Nodes and Edges) forms. The stability of mRNA strands was accurately
generated, with the Graph Convolutional Network being the best predictor of
reactivity ($RMSE = 0.249$) while the Gated Recurrent Unit Network was the best
at predicting risks of degradation ($RMSE = 0.266$). Combining all target
variables, the GRU performed the best with 76% accuracy. Results suggest these
models can be applied to understand and predict the chemical stability of mRNA
in the near future.
Related papers
- Dumpling GNN: Hybrid GNN Enables Better ADC Payload Activity Prediction Based on Chemical Structure [53.76752789814785]
DumplingGNN is a hybrid Graph Neural Network architecture specifically designed for predicting ADC payload activity based on chemical structure.
We evaluate it on a comprehensive ADC payload dataset focusing on DNA Topoisomerase I inhibitors.
It demonstrates exceptional accuracy (91.48%), sensitivity (95.08%), and specificity (97.54%) on our specialized ADC payload dataset.
arXiv Detail & Related papers (2024-09-23T17:11:04Z) - RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design [35.66059762160962]
We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design.
We formulate RNA structures as a set of rigid-body frames and associated loss functions.
To tackle the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations.
arXiv Detail & Related papers (2024-06-19T21:06:44Z) - BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models).
First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications.
Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models.
Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z) - Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure prediction [0.0]
We introduce a novel fully convolutional neural network (FCN) architecture for predicting the secondary structure of ribonucleic acid (RNA) molecules.
We employ deep learning to estimate the probability of base pairing between nucleotide residues.
On a widely adopted, standardized test set comprised of 1,305 molecules, the accuracy of our method exceeds that of current state-of-the-art (SOTA) secondary structure prediction software.
arXiv Detail & Related papers (2024-06-04T14:58:10Z) - Regressor-free Molecule Generation to Support Drug Response Prediction [83.25894107956735]
Conditional generation based on the target IC50 score can obtain a more effective sampling space.
Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels.
arXiv Detail & Related papers (2024-05-23T13:22:17Z) - A novel RNA pseudouridine site prediction model using Utility Kernel and
data-driven parameters [0.7373617024876725]
Pseudouridine is the most frequent modification in RNA.
Existing models to predict the pseudouridine sites in a given RNA sequence mainly depend on user-defined features.
We propose a Support Vector Machine (SVM) Kernel based on utility theory from Economics.
arXiv Detail & Related papers (2023-11-02T08:32:10Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - A QUBO model of the RNA folding problem optimized by variational hybrid
quantum annealing [0.0]
We present a model of the RNA folding problem amenable to both quantum annealers and circuit-model quantum computers.
We compare this formulation versus current RNA folding QUBOs after tuning the parameters of all against known RNA structures.
arXiv Detail & Related papers (2022-08-08T19:04:28Z) - E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D
Structure Prediction [46.38735421190187]
We develop the first end-to-end deep learning approach, E2Efold-3D, to accurately perform the textitde novo RNA structure prediction.
Several novel components are proposed to overcome the data scarcity, such as a fully-differentiable end-to-end pipeline, secondary structure-assisted self-distillation, and parameter-efficient backbone formulation.
arXiv Detail & Related papers (2022-07-04T17:15:35Z) - Predictive models of RNA degradation through dual crowdsourcing [2.003083111563343]
We describe a crowdsourced machine learning competition ("Stanford OpenVaccine") on Kaggle.
Winning models demonstrated test set errors that were better by 50% than the previous state-of-the-art DegScore model.
arXiv Detail & Related papers (2021-10-14T16:50:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.