A New Deep-learning-Based Approach For mRNA Optimization: High Fidelity, Computation Efficiency, and Multiple Optimization Factors
- URL: http://arxiv.org/abs/2505.23862v1
- Date: Thu, 29 May 2025 08:21:11 GMT
- Title: A New Deep-learning-Based Approach For mRNA Optimization: High Fidelity, Computation Efficiency, and Multiple Optimization Factors
- Authors: Zheng Gong, Ziyi Jiang, Weihao Gao, Deng Zhuo, Lan Ma,
- Abstract summary: We introduce textbfRNop, a novel deep learning-based method for mRNA optimization.<n>We collect a large-scale dataset containing over 3 million sequences and design four specialized loss functions, the GPLoss, CAILoss, tAILoss, and MFELoss.<n>RNop ensures high sequence fidelity, achieves significant computational throughput up to 47.32 sequences/s, and yields optimized mRNA sequences.
- Score: 12.26159226306187
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The mRNA optimization is critical for therapeutic and biotechnological applications, since sequence features directly govern protein expression levels and efficacy. However, current methods face significant challenges in simultaneously achieving three key objectives: (1) fidelity (preventing unintended amino acid changes), (2) computational efficiency (speed and scalability), and (3) the scope of optimization variables considered (multi-objective capability). Furthermore, existing methods often fall short of comprehensively incorporating the factors related to the mRNA lifecycle and translation process, including intrinsic mRNA sequence properties, secondary structure, translation elongation kinetics, and tRNA availability. To address these limitations, we introduce \textbf{RNop}, a novel deep learning-based method for mRNA optimization. We collect a large-scale dataset containing over 3 million sequences and design four specialized loss functions, the GPLoss, CAILoss, tAILoss, and MFELoss, which simultaneously enable explicit control over sequence fidelity while optimizing species-specific codon adaptation, tRNA availability, and desirable mRNA secondary structure features. Then, we demonstrate RNop's effectiveness through extensive in silico and in vivo experiments. RNop ensures high sequence fidelity, achieves significant computational throughput up to 47.32 sequences/s, and yields optimized mRNA sequences resulting in a significant increase in protein expression for functional proteins compared to controls. RNop surpasses current methodologies in both quantitative metrics and experimental validation, enlightening a new dawn for efficient and effective mRNA design. Code and models will be available at https://github.com/HudenJear/RPLoss.
Related papers
- Co-optimization of codon usage and mRNA secondary structure using quantum computing [0.0]
We introduce a novel variational framework that simultaneously optimize codon usage and secondary structure.<n>Our method employs a dual-objective function that balances the codon adaptation index (CAI) and minimum free energy (MFE)<n>We demonstrate the feasibility of executing this end-to-end workflow on real quantum hardware, using IBM's 127-qubit Eagle processor.
arXiv Detail & Related papers (2025-07-24T21:32:44Z) - Regulatory DNA sequence Design with Reinforcement Learning [56.20290878358356]
We propose a generative approach that leverages reinforcement learning to fine-tune a pre-trained autoregressive model.<n>We evaluate our method on promoter design tasks in two yeast media conditions and enhancer design tasks for three human cell types.
arXiv Detail & Related papers (2025-03-11T02:33:33Z) - BAnG: Bidirectional Anchored Generation for Conditional RNA Design [15.92155083519678]
RNA-BAnG is a deep learning-based model designed to generate RNA sequences for protein interactions without these requirements.<n>We first validate our method on generic synthetic tasks involving similar localized motifs to those appearing in RNAs.<n>We then evaluate our model on biological sequences, showing its effectiveness for conditional RNA sequence design given a binding protein.
arXiv Detail & Related papers (2025-02-28T17:51:00Z) - Helix-mRNA: A Hybrid Foundation Model For Full Sequence mRNA Therapeutics [3.2508287756500165]
mRNA-based vaccines have become a major focus in the pharmaceutical industry.<n> optimizing mRNA sequences for those properties remains a complex challenge.<n>We present Helix-mRNA, a structured state-space-based and attention hybrid model to address these challenges.
arXiv Detail & Related papers (2025-02-19T14:51:41Z) - LoRA-BERT: a Natural Language Processing Model for Robust and Accurate Prediction of long non-coding RNAs [11.346750562942345]
Long non-coding RNAs (lncRNAs) serve as crucial regulators in numerous biological processes.
Deep learning-based approaches have been introduced to classify lncRNAs.
LoRA-BERT is designed to capture the importance of nucleotide-level information during sequence classification.
arXiv Detail & Related papers (2024-11-11T22:17:01Z) - Latent Diffusion Models for Controllable RNA Sequence Generation [33.38594748558547]
RNA is a key intermediary between DNA and protein, exhibiting high sequence diversity and complex three-dimensional structures.
We develop a latent diffusion model for generating and optimizing discrete RNA sequences of variable lengths.
Empirical results confirm that RNAdiffusion generates non-coding RNAs that align with natural distributions across various biological metrics.
arXiv Detail & Related papers (2024-09-15T19:04:50Z) - RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching [0.0]
We propose RNACG (RNA Generator), a universal framework for RNA sequence design based on flow matching.<n>By unifying sequence generation under a single framework, RNACG enables the integration of multiple RNA design paradigms.
arXiv Detail & Related papers (2024-07-29T09:46:46Z) - BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models).<n>First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications.<n>Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models.<n>Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z) - RDesign: Hierarchical Data-efficient Representation Learning for
Tertiary Structure-based RNA Design [65.41144149958208]
This study aims to systematically construct a data-driven RNA design pipeline.
We crafted a benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure.
We incorporated extracted secondary structures with base pairs as prior knowledge to facilitate the RNA design process.
arXiv Detail & Related papers (2023-01-25T17:19:49Z) - Accurate RNA 3D structure prediction using a language model-based deep learning approach [50.193512039121984]
RhoFold+ is an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences.<n>RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction.
arXiv Detail & Related papers (2022-07-04T17:15:35Z) - Improving RNA Secondary Structure Design using Deep Reinforcement
Learning [69.63971634605797]
We propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure.
We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches.
arXiv Detail & Related papers (2021-11-05T02:54:06Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.