Related papers: E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction

E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction

URL: http://arxiv.org/abs/2207.01586v1
Date: Mon, 4 Jul 2022 17:15:35 GMT
Title: E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction
Authors: Tao Shen, Zhihang Hu, Zhangzhi Peng, Jiayang Chen, Peng Xiong, Liang Hong, Liangzhen Zheng, Yixuan Wang, Irwin King, Sheng Wang, Siqi Sun, and Yu Li
Abstract summary: We develop the first end-to-end deep learning approach, E2Efold-3D, to accurately perform the textitde novo RNA structure prediction. Several novel components are proposed to overcome the data scarcity, such as a fully-differentiable end-to-end pipeline, secondary structure-assisted self-distillation, and parameter-efficient backbone formulation.
Score: 46.38735421190187
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: RNA structure determination and prediction can promote RNA-targeted drug development and engineerable synthetic elements design. But due to the intrinsic structural flexibility of RNAs, all the three mainstream structure determination methods (X-ray crystallography, NMR, and Cryo-EM) encounter challenges when resolving the RNA structures, which leads to the scarcity of the resolved RNA structures. Computational prediction approaches emerge as complementary to the experimental techniques. However, none of the \textit{de novo} approaches is based on deep learning since too few structures are available. Instead, most of them apply the time-consuming sampling-based strategies, and their performance seems to hit the plateau. In this work, we develop the first end-to-end deep learning approach, E2Efold-3D, to accurately perform the \textit{de novo} RNA structure prediction. Several novel components are proposed to overcome the data scarcity, such as a fully-differentiable end-to-end pipeline, secondary structure-assisted self-distillation, and parameter-efficient backbone formulation. Such designs are validated on the independent, non-overlapping RNA puzzle testing dataset and reach an average sub-4 \AA{} root-mean-square deviation, demonstrating its superior performance compared to state-of-the-art approaches. Interestingly, it also achieves promising results when predicting RNA complex structures, a feat that none of the previous systems could accomplish. When E2Efold-3D is coupled with the experimental techniques, the RNA structure prediction field can be greatly advanced.

Related papers

RNA-GPT: Multimodal Generative System for RNA Sequence Understanding [6.611255836269348]
RNAs are essential molecules that carry genetic information vital for life. Despite this importance, RNA research is often hindered by the vast literature available on the topic. We introduce RNA-GPT, a multi-modal RNA chat model designed to simplify RNA discovery.
arXiv Detail & Related papers (2024-10-29T06:19:56Z)
Comprehensive benchmarking of large language models for RNA secondary structure prediction [0.0]
RNA-LLM uses large datasets of RNA sequences to learn, in a self-supervised way, how to represent each RNA base with a semantically rich numerical vector. Among them, predicting the secondary structure is a fundamental task for uncovering RNA functional mechanisms. We present a comprehensive experimental analysis of several pre-trained RNA-LLM, comparing them for the RNA secondary structure prediction task in a unified deep learning framework.
arXiv Detail & Related papers (2024-10-21T17:12:06Z)
Beyond Sequence: Impact of Geometric Context for RNA Property Prediction [6.559586725997741]
RNA structures can be represented as 1D sequences, 2D topological graphs, or 3D all-atom models. Existing works predominantly focus on 1D sequence-based models, which overlook the geometric context provided by 2D and 3D geometries. This study presents the first systematic evaluation of incorporating explicit 2D and 3D geometric information into RNA property prediction.
arXiv Detail & Related papers (2024-10-15T17:09:34Z)
Predicting Distance matrix with large language models [1.8855270809505869]
RNA structure prediction remains a significant challenge due to data limitations. Traditional methods such as nuclear magnetic resonance spectroscopy, Xray crystallography, and electron microscopy are expensive and time consuming. Distance maps provide a simplified representation of spatial constraints between nucleotides, capturing essential relationships without requiring a full 3D model.
arXiv Detail & Related papers (2024-09-24T10:28:55Z)
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design [41.80588259094431]
We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We formulate RNA structures as a set of rigid-body frames and associated loss functions. To tackle the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations.
arXiv Detail & Related papers (2024-06-19T21:06:44Z)
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models). First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications. Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models. Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z)
3D-based RNA function prediction tools in rnaglib [2.048226951354646]
Building datasets of RNA 3D structures and making appropriate modeling choices remains time-consuming and lacks standardization. We describe the use of rnaglib, to train supervised and unsupervised machine learning-based function prediction models on datasets of RNA 3D structures.
arXiv Detail & Related papers (2024-02-14T17:22:03Z)
RDesign: Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design [65.41144149958208]
This study aims to systematically construct a data-driven RNA design pipeline. We crafted a benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure. We incorporated extracted secondary structures with base pairs as prior knowledge to facilitate the RNA design process.
arXiv Detail & Related papers (2023-01-25T17:19:49Z)
Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective [63.3632827588974]
We introduce RFold, a method that learns to predict the most matching K-Rook solution from the given sequence. RFold achieves competitive performance and about eight times faster inference efficiency than state-of-the-art approaches.
arXiv Detail & Related papers (2022-12-02T16:34:56Z)
Improving RNA Secondary Structure Design using Deep Reinforcement Learning [69.63971634605797]
We propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure. We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches.
arXiv Detail & Related papers (2021-11-05T02:54:06Z)
Review of Machine-Learning Methods for RNA Secondary Structure Prediction [21.3539253580504]
We provide a comprehensive overview of RNA secondary structure prediction methods based on machine-learning technologies. The current pending issues in the field of RNA secondary structure prediction and future trends are also discussed.
arXiv Detail & Related papers (2020-09-01T03:17:15Z)
Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution. We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
RNA Secondary Structure Prediction By Learning Unrolled Algorithms [70.09461537906319]
In this paper, we propose an end-to-end deep learning model, called E2Efold, for RNA secondary structure prediction. The key idea of E2Efold is to directly predict the RNA base-pairing matrix, and use an unrolled algorithm for constrained programming as the template for deep architectures to enforce constraints. With comprehensive experiments on benchmark datasets, we demonstrate the superior performance of E2Efold.
arXiv Detail & Related papers (2020-02-13T23:21:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.