Related papers: RNA-GPT: Multimodal Generative System for RNA Sequence Understanding

RNA-GPT: Multimodal Generative System for RNA Sequence Understanding

URL: http://arxiv.org/abs/2411.08900v1
Date: Tue, 29 Oct 2024 06:19:56 GMT
Title: RNA-GPT: Multimodal Generative System for RNA Sequence Understanding
Authors: Yijia Xiao, Edward Sun, Yiqiao Jin, Wei Wang,
Abstract summary: RNAs are essential molecules that carry genetic information vital for life. Despite this importance, RNA research is often hindered by the vast literature available on the topic. We introduce RNA-GPT, a multi-modal RNA chat model designed to simplify RNA discovery.
Score: 6.611255836269348
License: http://creativecommons.org/licenses/by/4.0/
Abstract: RNAs are essential molecules that carry genetic information vital for life, with profound implications for drug development and biotechnology. Despite this importance, RNA research is often hindered by the vast literature available on the topic. To streamline this process, we introduce RNA-GPT, a multi-modal RNA chat model designed to simplify RNA discovery by leveraging extensive RNA literature. RNA-GPT integrates RNA sequence encoders with linear projection layers and state-of-the-art large language models (LLMs) for precise representation alignment, enabling it to process user-uploaded RNA sequences and deliver concise, accurate responses. Built on a scalable training pipeline, RNA-GPT utilizes RNA-QA, an automated system that gathers RNA annotations from RNACentral using a divide-and-conquer approach with GPT-4o and latent Dirichlet allocation (LDA) to efficiently handle large datasets and generate instruction-tuning samples. Our experiments indicate that RNA-GPT effectively addresses complex RNA queries, thereby facilitating RNA research. Additionally, we present RNA-QA, a dataset of 407,616 RNA samples for modality alignment and instruction tuning, further advancing the potential of RNA research tools.

Related papers

CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes [0.0]
Circular RNAs (circRNAs) are important components of the non-coding RNA regulatory network.<n>We propose a deep learning framework named CircFormerMoE based on transformers and mixture-of experts for predicting circRNAs directly from plant genomic DNA.
arXiv Detail & Related papers (2025-07-11T12:43:17Z)
A Comprehensive Review on RNA Subcellular Localization Prediction [0.125828876338076]
Long non-coding RNAs (lncRNAs), messenger RNAs (mRNAs), microRNAs (miRNAs) and other RNAs play a critical role in determining their biological functions. Traditional wet lab methods for determining RNA localization, such as in situ hybridization, are often time-consuming, resource-demanding, and costly. Computational methods leveraging artificial intelligence (AI) and machine learning (ML) have emerged as powerful alternatives.
arXiv Detail & Related papers (2025-04-24T00:47:31Z)
BAnG: Bidirectional Anchored Generation for Conditional RNA Design [15.92155083519678]
RNA-BAnG is a deep learning-based model designed to generate RNA sequences for protein interactions without these requirements. We first validate our method on generic synthetic tasks involving similar localized motifs to those appearing in RNAs. We then evaluate our model on biological sequences, showing its effectiveness for conditional RNA sequence design given a binding protein.
arXiv Detail & Related papers (2025-02-28T17:51:00Z)
Comprehensive benchmarking of large language models for RNA secondary structure prediction [0.0]
RNA-LLM uses large datasets of RNA sequences to learn, in a self-supervised way, how to represent each RNA base with a semantically rich numerical vector. Among them, predicting the secondary structure is a fundamental task for uncovering RNA functional mechanisms. We present a comprehensive experimental analysis of several pre-trained RNA-LLM, comparing them for the RNA secondary structure prediction task in a unified deep learning framework.
arXiv Detail & Related papers (2024-10-21T17:12:06Z)
RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching [0.0]
We develop a universal RNA sequence generation model based on flow matching, namely RNACG. RNACG can accommodate various conditional inputs and is portable, enabling users to customize the encoding network for conditional inputs. RNACG exhibits extensive applicability in sequence generation and property prediction tasks.
arXiv Detail & Related papers (2024-07-29T09:46:46Z)
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design [41.80588259094431]
We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We formulate RNA structures as a set of rigid-body frames and associated loss functions. To tackle the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations.
arXiv Detail & Related papers (2024-06-19T21:06:44Z)
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models). First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications. Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models. Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z)
RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching [7.600990806121113]
RNAFlow is a flow matching model for protein-conditioned RNA sequence-structure design. Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures.
arXiv Detail & Related papers (2024-05-29T05:10:25Z)
RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks [1.1764999317813143]
We introduce RiboNucleic Acid Language Model (RiNALMo) to unveil the hidden code of RNA. RiNALMo is the largest RNA language model to date, with 650M parameters pre-trained on 36M non-coding RNA sequences.
arXiv Detail & Related papers (2024-02-29T14:50:58Z)
scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain. scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator. This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z)
Knowledge from Large-Scale Protein Contact Prediction Models Can Be Transferred to the Data-Scarce RNA Contact Prediction Task [40.051834115537474]
We find that a protein-coevolution Transformer-based deep neural network can be transferred to the RNA contact prediction task. Experiments confirm that RNA contact prediction through transfer learning is greatly improved. Our findings indicate that the learned structural patterns of proteins can be transferred to RNAs, opening up potential new avenues for research.
arXiv Detail & Related papers (2023-02-13T06:00:56Z)
RDesign: Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design [65.41144149958208]
This study aims to systematically construct a data-driven RNA design pipeline. We crafted a benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure. We incorporated extracted secondary structures with base pairs as prior knowledge to facilitate the RNA design process.
arXiv Detail & Related papers (2023-01-25T17:19:49Z)
E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction [46.38735421190187]
We develop the first end-to-end deep learning approach, E2Efold-3D, to accurately perform the textitde novo RNA structure prediction. Several novel components are proposed to overcome the data scarcity, such as a fully-differentiable end-to-end pipeline, secondary structure-assisted self-distillation, and parameter-efficient backbone formulation.
arXiv Detail & Related papers (2022-07-04T17:15:35Z)
Improving RNA Secondary Structure Design using Deep Reinforcement Learning [69.63971634605797]
We propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure. We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches.
arXiv Detail & Related papers (2021-11-05T02:54:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.