Designing RNAs with Language Models
- URL: http://arxiv.org/abs/2602.12470v1
- Date: Thu, 12 Feb 2026 22:57:09 GMT
- Title: Designing RNAs with Language Models
- Authors: Milan Gautam, Ning Dai, Tianshuo Zhou, Bowen Xie, David Mathews, Liang Huang,
- Abstract summary: RNA design is a challenge due to the exponentially large sequence space and exponentially many competing folds.<n>We introduce a reusable neural approxorimator, instantiated as an autoregressive language model (LM), that maps target structures directly to sequences.<n>Across four datasets, our approach outperforms state-of-the-art systems on key metrics such as Boltzmann probability while being 1.7x faster.
- Score: 3.332772772624738
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: RNA design, the task of finding a sequence that folds into a target secondary structure, has broad biological and biomedical impact but remains computationally challenging due to the exponentially large sequence space and exponentially many competing folds. Traditional approaches treat it as an optimization problem, relying on per-instance heuristics or constraint-based search. We instead reframe RNA design as conditional sequence generation and introduce a reusable neural approximator, instantiated as an autoregressive language model (LM), that maps target structures directly to sequences. We first train our model in a supervised setting on random-induced structure-sequence pairs, and then use reinforcement learning (RL) to optimize end-to-end metrics. We also propose methods to select a small subset for RL that greatly improves RL efficiency and quality. Across four datasets, our approach outperforms state-of-the-art systems on key metrics such as Boltzmann probability while being 1.7x faster, establishing conditional LM generation as a scalable, task-agnostic alternative to per-instance optimization for RNA design. Our code and data are available at https://github.com/KuNyaa/RNA-Design-LM.
Related papers
- Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective [85.06838178922791]
Reinforcement Learning (RL) has proven highly effective for autoregressive language models.<n>But adapting these methods to diffusion large language models (dLLMs) presents fundamental challenges.<n>We propose a principled RL framework that treats entire sequence generation as a single action and uses the ELBO as a tractable sequence-level likelihood proxy.
arXiv Detail & Related papers (2025-12-03T13:05:32Z) - CodonMoE: DNA Language Models for mRNA Analyses [4.046100165562807]
Genomic language models (gLMs) face a fundamental efficiency challenge: either maintain separate specialized models for each biological modality (DNA and RNA) or develop large multi-modal architectures.<n>We introduce CodonMoE, a lightweight adapter that transforms DNA language models into effective RNA analyzers without RNA-specific pretraining.<n>Our approach provides a principled path toward unifying genomic language modeling, leveraging more abundant DNA data and reducing computational overhead.
arXiv Detail & Related papers (2025-08-06T01:40:12Z) - Differentiable Folding for Nearest Neighbor Model Optimization [0.6291443816903801]
The Nearest Neighbor model is the $textitde facto$ thermodynamic model of RNA secondary structure formation.<n>Here, we leverage recent advances in $textitdifferentiable folding$ to devise an efficient, scalable, and flexible means of parameter optimization.<n>Our method yields a significantly improved parameter set that outperforms existing baselines on all metrics.
arXiv Detail & Related papers (2025-03-12T05:36:12Z) - Regulatory DNA sequence Design with Reinforcement Learning [56.20290878358356]
We propose a generative approach that leverages reinforcement learning to fine-tune a pre-trained autoregressive model.<n>We evaluate our method on promoter design tasks in two yeast media conditions and enhancer design tasks for three human cell types.
arXiv Detail & Related papers (2025-03-11T02:33:33Z) - Helix-mRNA: A Hybrid Foundation Model For Full Sequence mRNA Therapeutics [3.2508287756500165]
mRNA-based vaccines have become a major focus in the pharmaceutical industry.<n> optimizing mRNA sequences for those properties remains a complex challenge.<n>We present Helix-mRNA, a structured state-space-based and attention hybrid model to address these challenges.
arXiv Detail & Related papers (2025-02-19T14:51:41Z) - BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models).<n>First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications.<n>Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models.<n>Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z) - Designing Biological Sequences via Meta-Reinforcement Learning and
Bayesian Optimization [68.28697120944116]
We train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection.
We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data.
Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results.
arXiv Detail & Related papers (2022-09-13T18:37:27Z) - Improving RNA Secondary Structure Design using Deep Reinforcement
Learning [69.63971634605797]
We propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure.
We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches.
arXiv Detail & Related papers (2021-11-05T02:54:06Z) - Approximate kNN Classification for Biomedical Data [1.1852406625172218]
Single-cell RNA-seq (scRNA-seq) is an emerging DNA sequencing technology with promising capabilities but significant computational challenges.
We propose the utilization of approximate nearest neighbor search algorithms for the task of kNN classification in scRNA-seq data.
arXiv Detail & Related papers (2020-12-03T18:30:43Z) - RNA Secondary Structure Prediction By Learning Unrolled Algorithms [70.09461537906319]
In this paper, we propose an end-to-end deep learning model, called E2Efold, for RNA secondary structure prediction.
The key idea of E2Efold is to directly predict the RNA base-pairing matrix, and use an unrolled algorithm for constrained programming as the template for deep architectures to enforce constraints.
With comprehensive experiments on benchmark datasets, we demonstrate the superior performance of E2Efold.
arXiv Detail & Related papers (2020-02-13T23:21:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.