RNAGenScape: Property-guided Optimization and Interpolation of mRNA Sequences with Manifold Langevin Dynamics
- URL: http://arxiv.org/abs/2510.24736v1
- Date: Tue, 14 Oct 2025 19:55:41 GMT
- Title: RNAGenScape: Property-guided Optimization and Interpolation of mRNA Sequences with Manifold Langevin Dynamics
- Authors: Danqi Liao, Chen Liu, Xingzhi Sun, DiƩ Tang, Haochen Wang, Scott Youlten, Srikar Krishna Gopinath, Haejeong Lee, Ethan C. Strayer, Antonio J. Giraldez, Smita Krishnaswamy,
- Abstract summary: RNAGenScape is a property-guided manifold Langevin dynamics framework that iteratively updates mRNA sequences within a learned latent manifold.<n>It supports property-guided optimization and smooth design between sequences, while remaining robust under scarce and undersampled data.<n> RNAGenScape establishes a paradigm for controllable mRNA design and latent space exploration in mRNA sequence modeling.
- Score: 14.76553653030291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: mRNA design and optimization are important in synthetic biology and therapeutic development, but remain understudied in machine learning. Systematic optimization of mRNAs is hindered by the scarce and imbalanced data as well as complex sequence-function relationships. We present RNAGenScape, a property-guided manifold Langevin dynamics framework that iteratively updates mRNA sequences within a learned latent manifold. RNAGenScape combines an organized autoencoder, which structures the latent space by target properties for efficient and biologically plausible exploration, with a manifold projector that contracts each step of update back to the manifold. RNAGenScape supports property-guided optimization and smooth interpolation between sequences, while remaining robust under scarce and undersampled data, and ensuring that intermediate products are close to the viable mRNA manifold. Across three real mRNA datasets, RNAGenScape improves the target properties with high success rates and efficiency, outperforming various generative or optimization methods developed for proteins or non-biological data. By providing continuous, data-aligned trajectories that reveal how edits influence function, RNAGenScape establishes a scalable paradigm for controllable mRNA design and latent space exploration in mRNA sequence modeling.
Related papers
- Equi-mRNA: Protein Translation Equivariant Encoding for mRNA Language Models [0.0]
We introduce Equi-mRNA, the first codon-level equivariant mRNA language model that explicitly encodes synonymous codon symmetries as cyclic subgroups of 2D Special Orthogonal matrix (SO(2))<n>On downstream property-prediction tasks including expression, stability, and riboswitch switching, Equi-mRNA delivers up to approximately 10% improvements in accuracy.<n> Equi-mRNA establishes a new biologically principled paradigm for mRNA modeling, with significant implications for the design of next-generation therapeutics.
arXiv Detail & Related papers (2025-08-20T22:42:10Z) - A New Deep-learning-Based Approach For mRNA Optimization: High Fidelity, Computation Efficiency, and Multiple Optimization Factors [12.26159226306187]
We introduce textbfRNop, a novel deep learning-based method for mRNA optimization.<n>We collect a large-scale dataset containing over 3 million sequences and design four specialized loss functions, the GPLoss, CAILoss, tAILoss, and MFELoss.<n>RNop ensures high sequence fidelity, achieves significant computational throughput up to 47.32 sequences/s, and yields optimized mRNA sequences.
arXiv Detail & Related papers (2025-05-29T08:21:11Z) - Helix-mRNA: A Hybrid Foundation Model For Full Sequence mRNA Therapeutics [3.2508287756500165]
mRNA-based vaccines have become a major focus in the pharmaceutical industry.<n> optimizing mRNA sequences for those properties remains a complex challenge.<n>We present Helix-mRNA, a structured state-space-based and attention hybrid model to address these challenges.
arXiv Detail & Related papers (2025-02-19T14:51:41Z) - Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification [55.98854157265578]
Life-Code is a comprehensive framework that spans different biological functions.<n>We propose a unified pipeline to integrate multi-omics data by reverse-transcribing RNA and reverse-translating amino acids into nucleotide-based sequences.<n>Life-Code achieves state-of-the-art results on various tasks across three omics, highlighting its potential for advancing multi-omics analysis and interpretation.
arXiv Detail & Related papers (2025-02-11T06:53:59Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - Latent Diffusion Models for Controllable RNA Sequence Generation [33.38594748558547]
RNA is a key intermediary between DNA and protein, exhibiting high sequence diversity and complex three-dimensional structures.
We develop a latent diffusion model for generating and optimizing discrete RNA sequences of variable lengths.
Empirical results confirm that RNAdiffusion generates non-coding RNAs that align with natural distributions across various biological metrics.
arXiv Detail & Related papers (2024-09-15T19:04:50Z) - RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching [0.0]
We propose RNACG (RNA Generator), a universal framework for RNA sequence design based on flow matching.<n>By unifying sequence generation under a single framework, RNACG enables the integration of multiple RNA design paradigms.
arXiv Detail & Related papers (2024-07-29T09:46:46Z) - BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models).<n>First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications.<n>Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models.<n>Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z) - RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching [7.600990806121113]
RNAFlow is a flow matching model for protein-conditioned RNA sequence-structure design.
Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures.
arXiv Detail & Related papers (2024-05-29T05:10:25Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - RDesign: Hierarchical Data-efficient Representation Learning for
Tertiary Structure-based RNA Design [65.41144149958208]
This study aims to systematically construct a data-driven RNA design pipeline.
We crafted a benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure.
We incorporated extracted secondary structures with base pairs as prior knowledge to facilitate the RNA design process.
arXiv Detail & Related papers (2023-01-25T17:19:49Z) - Accurate RNA 3D structure prediction using a language model-based deep learning approach [50.193512039121984]
RhoFold+ is an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences.<n>RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction.
arXiv Detail & Related papers (2022-07-04T17:15:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.