Related papers: Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure prediction

Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure prediction

URL: http://arxiv.org/abs/2406.02381v2
Date: Thu, 6 Jun 2024 14:04:32 GMT
Title: Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure prediction
Authors: Marc Harary, Chengxin Zhang,
Abstract summary: We introduce a novel fully convolutional neural network (FCN) architecture for predicting the secondary structure of ribonucleic acid (RNA) molecules. We employ deep learning to estimate the probability of base pairing between nucleotide residues. On a widely adopted, standardized test set comprised of 1,305 molecules, the accuracy of our method exceeds that of current state-of-the-art (SOTA) secondary structure prediction software.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a novel fully convolutional neural network (FCN) architecture for predicting the secondary structure of ribonucleic acid (RNA) molecules. Interpreting RNA structures as weighted graphs, we employ deep learning to estimate the probability of base pairing between nucleotide residues. Unique to our model are its massive 11-pixel kernels, which we argue provide a distinct advantage for FCNs on the specialized domain of RNA secondary structures. On a widely adopted, standardized test set comprised of 1,305 molecules, the accuracy of our method exceeds that of current state-of-the-art (SOTA) secondary structure prediction software, achieving a Matthews Correlation Coefficient (MCC) over 11-40% higher than that of other leading methods on overall structures and 58-400% higher on pseudoknots specifically.

Related papers

Comprehensive benchmarking of large language models for RNA secondary structure prediction [0.0]
RNA-LLM uses large datasets of RNA sequences to learn, in a self-supervised way, how to represent each RNA base with a semantically rich numerical vector. Among them, predicting the secondary structure is a fundamental task for uncovering RNA functional mechanisms. We present a comprehensive experimental analysis of several pre-trained RNA-LLM, comparing them for the RNA secondary structure prediction task in a unified deep learning framework.
arXiv Detail & Related papers (2024-10-21T17:12:06Z)
Bridging Sequence-Structure Alignment in RNA Foundation Models [7.068604225076706]
The alignment between RNA sequences and structures in foundation models (FMs) has yet to be investigated. Existing FMs have struggled to establish sequence-structure alignment, hindering the free flow of genomic information. We introduce OmniGenome, an RNA FM trained to align RNA sequences with respect to secondary structures based on structure-contextualised modelling.
arXiv Detail & Related papers (2024-07-15T21:10:40Z)
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design [35.66059762160962]
We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We formulate RNA structures as a set of rigid-body frames and associated loss functions. To tackle the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations.
arXiv Detail & Related papers (2024-06-19T21:06:44Z)
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models). First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications. Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models. Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z)
RDesign: Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design [65.41144149958208]
This study aims to systematically construct a data-driven RNA design pipeline. We crafted a benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure. We incorporated extracted secondary structures with base pairs as prior knowledge to facilitate the RNA design process.
arXiv Detail & Related papers (2023-01-25T17:19:49Z)
E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction [46.38735421190187]
We develop the first end-to-end deep learning approach, E2Efold-3D, to accurately perform the textitde novo RNA structure prediction. Several novel components are proposed to overcome the data scarcity, such as a fully-differentiable end-to-end pipeline, secondary structure-assisted self-distillation, and parameter-efficient backbone formulation.
arXiv Detail & Related papers (2022-07-04T17:15:35Z)
Neural representation and generation for RNA secondary structures [14.583976833366384]
Our work is concerned with the generation and targeted design of RNA, a type of genetic macromolecule. The design of large scale and complex biological structures spurs dedicated graph-based deep generative modeling techniques. We propose a flexible framework to jointly embed and generate different RNA structural modalities.
arXiv Detail & Related papers (2021-02-01T15:49:25Z)
Review of Machine-Learning Methods for RNA Secondary Structure Prediction [21.3539253580504]
We provide a comprehensive overview of RNA secondary structure prediction methods based on machine-learning technologies. The current pending issues in the field of RNA secondary structure prediction and future trends are also discussed.
arXiv Detail & Related papers (2020-09-01T03:17:15Z)
Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution. We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques. We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
RNA Secondary Structure Prediction By Learning Unrolled Algorithms [70.09461537906319]
In this paper, we propose an end-to-end deep learning model, called E2Efold, for RNA secondary structure prediction. The key idea of E2Efold is to directly predict the RNA base-pairing matrix, and use an unrolled algorithm for constrained programming as the template for deep architectures to enforce constraints. With comprehensive experiments on benchmark datasets, we demonstrate the superior performance of E2Efold.
arXiv Detail & Related papers (2020-02-13T23:21:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.