Neural representation and generation for RNA secondary structures
- URL: http://arxiv.org/abs/2102.00925v1
- Date: Mon, 1 Feb 2021 15:49:25 GMT
- Title: Neural representation and generation for RNA secondary structures
- Authors: Zichao Yan, William L. Hamilton and Mathieu Blanchette
- Abstract summary: Our work is concerned with the generation and targeted design of RNA, a type of genetic macromolecule.
The design of large scale and complex biological structures spurs dedicated graph-based deep generative modeling techniques.
We propose a flexible framework to jointly embed and generate different RNA structural modalities.
- Score: 14.583976833366384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our work is concerned with the generation and targeted design of RNA, a type
of genetic macromolecule that can adopt complex structures which influence
their cellular activities and functions. The design of large scale and complex
biological structures spurs dedicated graph-based deep generative modeling
techniques, which represents a key but underappreciated aspect of computational
drug discovery. In this work, we investigate the principles behind representing
and generating different RNA structural modalities, and propose a flexible
framework to jointly embed and generate these molecular structures along with
their sequence in a meaningful latent space. Equipped with a deep understanding
of RNA molecular structures, our most sophisticated encoding and decoding
methods operate on the molecular graph as well as the junction tree hierarchy,
integrating strong inductive bias about RNA structural regularity and folding
mechanism such that high structural validity, stability and diversity of
generated RNAs are achieved. Also, we seek to adequately organize the latent
space of RNA molecular embeddings with regard to the interaction with proteins,
and targeted optimization is used to navigate in this latent space to search
for desired novel RNA molecules.
Related papers
- DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching [0.0]
We develop a universal RNA sequence generation model based on flow matching, namely RNACG.
RNACG can accommodate various conditional inputs and is portable, enabling users to customize the encoding network for conditional inputs.
RNACG exhibits extensive applicability in sequence generation and property prediction tasks.
arXiv Detail & Related papers (2024-07-29T09:46:46Z) - RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching [7.600990806121113]
RNAFlow is a flow matching model for protein-conditioned RNA sequence-structure design.
Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures.
arXiv Detail & Related papers (2024-05-29T05:10:25Z) - RNA Secondary Structure Prediction Using Transformer-Based Deep Learning Models [13.781096813376145]
The Human Genome Project has led to an exponential increase in data related to the sequence, structure, and function of biomolecules.
This paper discusses the fundamental concepts of RNA, RNA secondary structure, and its prediction.
The application of machine learning technologies in predicting the structure of biological macromolecules is explored.
arXiv Detail & Related papers (2024-04-14T08:36:14Z) - RDesign: Hierarchical Data-efficient Representation Learning for
Tertiary Structure-based RNA Design [65.41144149958208]
This study aims to systematically construct a data-driven RNA design pipeline.
We crafted a benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure.
We incorporated extracted secondary structures with base pairs as prior knowledge to facilitate the RNA design process.
arXiv Detail & Related papers (2023-01-25T17:19:49Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D
Structure Prediction [46.38735421190187]
We develop the first end-to-end deep learning approach, E2Efold-3D, to accurately perform the textitde novo RNA structure prediction.
Several novel components are proposed to overcome the data scarcity, such as a fully-differentiable end-to-end pipeline, secondary structure-assisted self-distillation, and parameter-efficient backbone formulation.
arXiv Detail & Related papers (2022-07-04T17:15:35Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Improving RNA Secondary Structure Design using Deep Reinforcement
Learning [69.63971634605797]
We propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure.
We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches.
arXiv Detail & Related papers (2021-11-05T02:54:06Z) - VeRNAl: Mining RNA Structures for Fuzzy Base Pairing Network Motifs [13.990800077082843]
RNA 3D motifs are recurrent substructures modelled as networks of base pair interactions.
We propose a set of node similarity functions, clustering methods, and motif construction algorithms to recover flexible RNA motifs.
VeRNAl can be easily customized by users to desired levels of motif flexibility, abundance and size.
arXiv Detail & Related papers (2020-09-01T19:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.