RDesign: Hierarchical Data-efficient Representation Learning for
Tertiary Structure-based RNA Design
- URL: http://arxiv.org/abs/2301.10774v3
- Date: Thu, 7 Mar 2024 02:07:37 GMT
- Title: RDesign: Hierarchical Data-efficient Representation Learning for
Tertiary Structure-based RNA Design
- Authors: Cheng Tan, Yijie Zhang, Zhangyang Gao, Bozhen Hu, Siyuan Li, Zicheng
Liu, Stan Z. Li
- Abstract summary: This study aims to systematically construct a data-driven RNA design pipeline.
We crafted a benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure.
We incorporated extracted secondary structures with base pairs as prior knowledge to facilitate the RNA design process.
- Score: 65.41144149958208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While artificial intelligence has made remarkable strides in revealing the
relationship between biological macromolecules' primary sequence and tertiary
structure, designing RNA sequences based on specified tertiary structures
remains challenging. Though existing approaches in protein design have
thoroughly explored structure-to-sequence dependencies in proteins, RNA design
still confronts difficulties due to structural complexity and data scarcity.
Moreover, direct transplantation of protein design methodologies into RNA
design fails to achieve satisfactory outcomes although sharing similar
structural components. In this study, we aim to systematically construct a
data-driven RNA design pipeline. We crafted a large, well-curated benchmark
dataset and designed a comprehensive structural modeling approach to represent
the complex RNA tertiary structure. More importantly, we proposed a
hierarchical data-efficient representation learning framework that learns
structural representations through contrastive learning at both cluster-level
and sample-level to fully leverage the limited data. By constraining data
representations within a limited hyperspherical space, the intrinsic
relationships between data points could be explicitly imposed. Moreover, we
incorporated extracted secondary structures with base pairs as prior knowledge
to facilitate the RNA design process. Extensive experiments demonstrate the
effectiveness of our proposed method, providing a reliable baseline for future
RNA design tasks. The source code and benchmark dataset are available at
https://github.com/A4Bio/RDesign.
Related papers
- Comprehensive benchmarking of large language models for RNA secondary structure prediction [0.0]
RNA-LLM uses large datasets of RNA sequences to learn, in a self-supervised way, how to represent each RNA base with a semantically rich numerical vector.
Among them, predicting the secondary structure is a fundamental task for uncovering RNA functional mechanisms.
We present a comprehensive experimental analysis of several pre-trained RNA-LLM, comparing them for the RNA secondary structure prediction task in a unified deep learning framework.
arXiv Detail & Related papers (2024-10-21T17:12:06Z) - BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models).
First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications.
Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models.
Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z) - RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching [7.600990806121113]
RNAFlow is a flow matching model for protein-conditioned RNA sequence-structure design.
Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures.
arXiv Detail & Related papers (2024-05-29T05:10:25Z) - 3D-based RNA function prediction tools in rnaglib [2.048226951354646]
Building datasets of RNA 3D structures and making appropriate modeling choices remains time-consuming and lacks standardization.
We describe the use of rnaglib, to train supervised and unsupervised machine learning-based function prediction models on datasets of RNA 3D structures.
arXiv Detail & Related papers (2024-02-14T17:22:03Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D
Structure Prediction [46.38735421190187]
We develop the first end-to-end deep learning approach, E2Efold-3D, to accurately perform the textitde novo RNA structure prediction.
Several novel components are proposed to overcome the data scarcity, such as a fully-differentiable end-to-end pipeline, secondary structure-assisted self-distillation, and parameter-efficient backbone formulation.
arXiv Detail & Related papers (2022-07-04T17:15:35Z) - Neural representation and generation for RNA secondary structures [14.583976833366384]
Our work is concerned with the generation and targeted design of RNA, a type of genetic macromolecule.
The design of large scale and complex biological structures spurs dedicated graph-based deep generative modeling techniques.
We propose a flexible framework to jointly embed and generate different RNA structural modalities.
arXiv Detail & Related papers (2021-02-01T15:49:25Z) - RNA Secondary Structure Prediction By Learning Unrolled Algorithms [70.09461537906319]
In this paper, we propose an end-to-end deep learning model, called E2Efold, for RNA secondary structure prediction.
The key idea of E2Efold is to directly predict the RNA base-pairing matrix, and use an unrolled algorithm for constrained programming as the template for deep architectures to enforce constraints.
With comprehensive experiments on benchmark datasets, we demonstrate the superior performance of E2Efold.
arXiv Detail & Related papers (2020-02-13T23:21:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.