Equi-mRNA: Protein Translation Equivariant Encoding for mRNA Language Models
- URL: http://arxiv.org/abs/2508.15103v1
- Date: Wed, 20 Aug 2025 22:42:10 GMT
- Title: Equi-mRNA: Protein Translation Equivariant Encoding for mRNA Language Models
- Authors: Mehdi Yazdani-Jahromi, Ali Khodabandeh Yalabadi, Ozlem Ozmen Garibay,
- Abstract summary: We introduce Equi-mRNA, the first codon-level equivariant mRNA language model that explicitly encodes synonymous codon symmetries as cyclic subgroups of 2D Special Orthogonal matrix (SO(2))<n>On downstream property-prediction tasks including expression, stability, and riboswitch switching, Equi-mRNA delivers up to approximately 10% improvements in accuracy.<n> Equi-mRNA establishes a new biologically principled paradigm for mRNA modeling, with significant implications for the design of next-generation therapeutics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The growing importance of mRNA therapeutics and synthetic biology highlights the need for models that capture the latent structure of synonymous codon (different triplets encoding the same amino acid) usage, which subtly modulates translation efficiency and gene expression. While recent efforts incorporate codon-level inductive biases through auxiliary objectives, they often fall short of explicitly modeling the structured relationships that arise from the genetic code's inherent symmetries. We introduce Equi-mRNA, the first codon-level equivariant mRNA language model that explicitly encodes synonymous codon symmetries as cyclic subgroups of 2D Special Orthogonal matrix (SO(2)). By combining group-theoretic priors with an auxiliary equivariance loss and symmetry-aware pooling, Equi-mRNA learns biologically grounded representations that outperform vanilla baselines across multiple axes. On downstream property-prediction tasks including expression, stability, and riboswitch switching Equi-mRNA delivers up to approximately 10% improvements in accuracy. In sequence generation, it produces mRNA constructs that are up to approximately 4x more realistic under Frechet BioDistance metrics and approximately 28% better preserve functional properties compared to vanilla baseline. Interpretability analyses further reveal that learned codon-rotation distributions recapitulate known GC-content biases and tRNA abundance patterns, offering novel insights into codon usage. Equi-mRNA establishes a new biologically principled paradigm for mRNA modeling, with significant implications for the design of next-generation therapeutics.
Related papers
- RNAGenScape: Property-guided Optimization and Interpolation of mRNA Sequences with Manifold Langevin Dynamics [14.76553653030291]
RNAGenScape is a property-guided manifold Langevin dynamics framework that iteratively updates mRNA sequences within a learned latent manifold.<n>It supports property-guided optimization and smooth design between sequences, while remaining robust under scarce and undersampled data.<n> RNAGenScape establishes a paradigm for controllable mRNA design and latent space exploration in mRNA sequence modeling.
arXiv Detail & Related papers (2025-10-14T19:55:41Z) - A New Deep-learning-Based Approach For mRNA Optimization: High Fidelity, Computation Efficiency, and Multiple Optimization Factors [12.26159226306187]
We introduce textbfRNop, a novel deep learning-based method for mRNA optimization.<n>We collect a large-scale dataset containing over 3 million sequences and design four specialized loss functions, the GPLoss, CAILoss, tAILoss, and MFELoss.<n>RNop ensures high sequence fidelity, achieves significant computational throughput up to 47.32 sequences/s, and yields optimized mRNA sequences.
arXiv Detail & Related papers (2025-05-29T08:21:11Z) - Regulatory DNA sequence Design with Reinforcement Learning [56.20290878358356]
We propose a generative approach that leverages reinforcement learning to fine-tune a pre-trained autoregressive model.<n>We evaluate our method on promoter design tasks in two yeast media conditions and enhancer design tasks for three human cell types.
arXiv Detail & Related papers (2025-03-11T02:33:33Z) - Helix-mRNA: A Hybrid Foundation Model For Full Sequence mRNA Therapeutics [3.2508287756500165]
mRNA-based vaccines have become a major focus in the pharmaceutical industry.<n> optimizing mRNA sequences for those properties remains a complex challenge.<n>We present Helix-mRNA, a structured state-space-based and attention hybrid model to address these challenges.
arXiv Detail & Related papers (2025-02-19T14:51:41Z) - Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification [55.98854157265578]
Life-Code is a comprehensive framework that spans different biological functions.<n>We propose a unified pipeline to integrate multi-omics data by reverse-transcribing RNA and reverse-translating amino acids into nucleotide-based sequences.<n>Life-Code achieves state-of-the-art results on various tasks across three omics, highlighting its potential for advancing multi-omics analysis and interpretation.
arXiv Detail & Related papers (2025-02-11T06:53:59Z) - HELM: Hierarchical Encoding for mRNA Language Modeling [4.990962434274757]
We introduce Hierarchical generative approaches for mRNA Language Modeling (HELM)<n>HELM modulates the loss function based on codon synonymity, aligning the model's learning process with the biological reality of mRNA sequences.<n>We evaluate HELM on diverse mRNA datasets and tasks, demonstrating that HELM outperforms standard language model pre-training.
arXiv Detail & Related papers (2024-10-16T11:16:47Z) - mRNA2vec: mRNA Embedding with Language Model in the 5'UTR-CDS for mRNA Design [0.4999814847776097]
This paper proposes a novel contextual language model (LM)-based embedding method: mRNA2vec.<n>In contrast to existing mRNA embedding approaches, our method is based on the self-supervised teacher-student learning framework of data2vec.<n> mRNA2vec demonstrates significant improvements in translation efficiency (TE) and expression level (EL) prediction tasks.
arXiv Detail & Related papers (2024-08-16T23:23:40Z) - BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models).<n>First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications.<n>Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models.<n>Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution [76.97231739317259]
We present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level.
On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data.
arXiv Detail & Related papers (2023-06-27T20:46:34Z) - Accurate RNA 3D structure prediction using a language model-based deep learning approach [50.193512039121984]
RhoFold+ is an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences.<n>RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction.
arXiv Detail & Related papers (2022-07-04T17:15:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.