LoRA-BERT: a Natural Language Processing Model for Robust and Accurate Prediction of long non-coding RNAs
- URL: http://arxiv.org/abs/2411.08073v1
- Date: Mon, 11 Nov 2024 22:17:01 GMT
- Title: LoRA-BERT: a Natural Language Processing Model for Robust and Accurate Prediction of long non-coding RNAs
- Authors: Nicholas Jeon, Xiaoning Qian, Lamin SaidyKhan, Paul de Figueiredo, Byung-Jun Yoon,
- Abstract summary: Long non-coding RNAs (lncRNAs) serve as crucial regulators in numerous biological processes.
Deep learning-based approaches have been introduced to classify lncRNAs.
LoRA-BERT is designed to capture the importance of nucleotide-level information during sequence classification.
- Score: 11.346750562942345
- License:
- Abstract: Long non-coding RNAs (lncRNAs) serve as crucial regulators in numerous biological processes. Although they share sequence similarities with messenger RNAs (mRNAs), lncRNAs perform entirely different roles, providing new avenues for biological research. The emergence of next-generation sequencing technologies has greatly advanced the detection and identification of lncRNA transcripts and deep learning-based approaches have been introduced to classify long non-coding RNAs (lncRNAs). These advanced methods have significantly enhanced the efficiency of identifying lncRNAs. However, many of these methods are devoid of robustness and accuracy due to the extended length of the sequences involved. To tackle this issue, we have introduced a novel pre-trained bidirectional encoder representation called LoRA-BERT. LoRA-BERT is designed to capture the importance of nucleotide-level information during sequence classification, leading to more robust and satisfactory outcomes. In a comprehensive comparison with commonly used sequence prediction tools, we have demonstrated that LoRA-BERT outperforms them in terms of accuracy and efficiency. Our results indicate that, when utilizing the transformer model, LoRA-BERT achieves state-of-the-art performance in predicting both lncRNAs and mRNAs for human and mouse species. Through the utilization of LoRA-BERT, we acquire valuable insights into the traits of lncRNAs and mRNAs, offering the potential to aid in the comprehension and detection of diseases linked to lncRNAs in humans.
Related papers
- Character-level Tokenizations as Powerful Inductive Biases for RNA Foundational Models [0.0]
understanding and predicting RNA behavior is a challenge due to the complexity of RNA structures and interactions.
Current RNA models have yet to match the performance observed in the protein domain.
ChaRNABERT is able to reach state-of-the-art performance on several tasks in established benchmarks.
arXiv Detail & Related papers (2024-11-05T21:56:16Z) - RNA-GPT: Multimodal Generative System for RNA Sequence Understanding [6.611255836269348]
RNAs are essential molecules that carry genetic information vital for life.
Despite this importance, RNA research is often hindered by the vast literature available on the topic.
We introduce RNA-GPT, a multi-modal RNA chat model designed to simplify RNA discovery.
arXiv Detail & Related papers (2024-10-29T06:19:56Z) - BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models).
First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications.
Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models.
Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z) - Machine Learning Modeling Of SiRNA Structure-Potency Relationship With
Applications Against Sars-Cov-2 Spike Gene [0.0]
Drug discovery process is lengthy and costly, taking nearly a decade to bring a new drug to the market.
Biotechnology, computational methods, and machine learning algorithms have the potential to revolutionize drug discovery, speeding up the process and improving patient outcomes.
The COVID-19 pandemic has further accelerated and deepened the recognition of the potential of these techniques, especially in the areas of drug repurposing and efficacy predictions.
arXiv Detail & Related papers (2024-01-18T23:00:34Z) - Description Generation using Variational Auto-Encoders for precursor
microRNA [5.6710852973206105]
We propose a novel framework, which makes use of generative modeling through Vari Auto-Encoders to uncover latent factors of pre-miRNA.
Applying the framework to classification, we obtain a high reconstruction and classification performance, while also developing an accurate description.
arXiv Detail & Related papers (2023-11-29T15:41:45Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - Knowledge from Large-Scale Protein Contact Prediction Models Can Be
Transferred to the Data-Scarce RNA Contact Prediction Task [40.051834115537474]
We find that a protein-coevolution Transformer-based deep neural network can be transferred to the RNA contact prediction task.
Experiments confirm that RNA contact prediction through transfer learning is greatly improved.
Our findings indicate that the learned structural patterns of proteins can be transferred to RNAs, opening up potential new avenues for research.
arXiv Detail & Related papers (2023-02-13T06:00:56Z) - RDesign: Hierarchical Data-efficient Representation Learning for
Tertiary Structure-based RNA Design [65.41144149958208]
This study aims to systematically construct a data-driven RNA design pipeline.
We crafted a benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure.
We incorporated extracted secondary structures with base pairs as prior knowledge to facilitate the RNA design process.
arXiv Detail & Related papers (2023-01-25T17:19:49Z) - E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D
Structure Prediction [46.38735421190187]
We develop the first end-to-end deep learning approach, E2Efold-3D, to accurately perform the textitde novo RNA structure prediction.
Several novel components are proposed to overcome the data scarcity, such as a fully-differentiable end-to-end pipeline, secondary structure-assisted self-distillation, and parameter-efficient backbone formulation.
arXiv Detail & Related papers (2022-07-04T17:15:35Z) - Improving RNA Secondary Structure Design using Deep Reinforcement
Learning [69.63971634605797]
We propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure.
We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches.
arXiv Detail & Related papers (2021-11-05T02:54:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.