Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
- URL: http://arxiv.org/abs/2403.03234v2
- Date: Wed, 5 Jun 2024 21:02:37 GMT
- Title: Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
- Authors: Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov,
- Abstract summary: Long-range token interactions, the effects of upstream and downstream regions of the genome, and the reverse complementarity of DNA are studied.
Here, we propose an architecture motivated by these challenges that builds off the long-range Mamba block.
We use MambaDNA as the basis of Caduceus, the first family of RC equivariant bi-directional long-range DNA language models.
- Score: 36.37643634126816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics. However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstream regions of the genome, and the reverse complementarity (RC) of DNA. Here, we propose an architecture motivated by these challenges that builds off the long-range Mamba block, and extends it to a BiMamba component that supports bi-directionality, and to a MambaDNA block that additionally supports RC equivariance. We use MambaDNA as the basis of Caduceus, the first family of RC equivariant bi-directional long-range DNA language models, and we introduce pre-training and fine-tuning strategies that yield Caduceus DNA foundation models. Caduceus outperforms previous long-range models on downstream benchmarks; on a challenging long-range variant effect prediction task, Caduceus exceeds the performance of 10x larger models that do not leverage bi-directionality or equivariance.
Related papers
- eccDNAMamba: A Pre-Trained Model for Ultra-Long eccDNA Sequence Analysis [5.86106644437914]
Extrachromosomal circular DNA (eccDNA) plays key regulatory roles and contributes to oncogene overexpression in cancer.<n>No pre-trained models currently support full-length circular eccDNA for downstream analysis.<n>eccDNAMamba is the first bidirectional state-space encoder tailored for circular DNA sequences.
arXiv Detail & Related papers (2025-06-22T17:50:57Z) - Mamba-Driven Topology Fusion for Monocular 3-D Human Pose Estimation [32.185238802221576]
Recently, the Mamba model has substantially reduced computational overhead.<n>The ability of SSM to process sequential data is not suitable for 3-D joint sequences with topological structures.<n>We propose the Mamba-Driven Topology Fusion framework to address these issues.
arXiv Detail & Related papers (2025-05-27T01:21:57Z) - JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model [1.6128508494592848]
Large language models (LLMs) have revolutionized natural language processing and are increasingly applied to other sequential data types.<n>We introduce JanusDNA, the first bidirectional DNA foundation model built upon a novel pretraining paradigm.<n>JanusDNA processes up to 1 million base pairs at single nucleotide resolution on a single 80GB GPU.
arXiv Detail & Related papers (2025-05-22T20:10:55Z) - Bidirectional Mamba for Single-Cell Data: Efficient Context Learning with Biological Fidelity [0.39945675027960637]
We introduce GeneMamba, a scalable and efficient foundation model for single-cell transcriptomics built on state space modeling.
GeneMamba captures bidirectional gene context with linear-time complexity, offering substantial computational gains over transformer baselines.
We evaluate GeneMamba across diverse tasks, including multi-batch integration, cell type annotation, and gene-gene correlation, demonstrating strong performance, interpretability, and robustness.
arXiv Detail & Related papers (2025-04-22T20:34:47Z) - Gene42: Long-Range Genomic Foundation Model With Dense Attention [39.22636278244394]
We introduce Gene42, a novel family of Genomic Foundation Models (GFMs)
Gene42 models utilize a decoder-only (LLaMA-style) architecture with a dense self-attention mechanism.
Gene42 is the first dense attention model capable of handling such extensive long context lengths in genomics.
arXiv Detail & Related papers (2025-03-20T07:10:04Z) - UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion [61.690978792873196]
Existing approaches rely on either autoregressive sequence models or diffusion models.
We propose UniGenX, a unified framework that combines autoregressive next-token prediction with conditional diffusion models.
We validate the effectiveness of UniGenX on material and small molecule generation tasks.
arXiv Detail & Related papers (2025-03-09T16:43:07Z) - Primer C-VAE: An interpretable deep learning primer design method to detect emerging virus variants [0.5821597945324924]
We developed Primer C-VAE, a model based on a Variational Auto-Encoder framework with Convolutional Neural Networks.
Using SARS-CoV-2, our model classified variants with 98% accuracy and generated variant-specific primers.
The model also generated effective primers for organisms with longer gene sequences like E. coli and Shigella flexneri.
arXiv Detail & Related papers (2025-03-03T12:17:19Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.
Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.
It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence [33.38031167119682]
In few-shot action recognition, long sub-sequences of video naturally express entire actions more effectively.
Recent Mamba demonstrates efficiency in modeling long sequences, but directly applying Mamba to FSAR overlooks the importance of local feature modeling and alignment.
We propose a Matryoshka MAmba and CoNtrasTive LeArning framework (Manta) to solve these challenges.
Manta achieves new state-of-the-art performance on prominent benchmarks, including SSv2, Kinetics, UCF101, and HMDB51.
arXiv Detail & Related papers (2024-12-10T13:03:42Z) - dnaGrinder: a lightweight and high-capacity genomic foundation model [11.646351318648499]
Current genomic foundation models often face a critical tradeoff: smaller models with mediocre performance versus large models with improved performance.
We introduce dnaGrinder, a unique and efficient genomic foundation model.
dnaGrinder excels at managing long-range dependencies within genomic sequences while minimizing computational costs without compromising performance.
arXiv Detail & Related papers (2024-09-24T03:20:07Z) - UU-Mamba: Uncertainty-aware U-Mamba for Cardiovascular Segmentation [26.621625716575746]
This paper introduces the UU-Mamba model, an extension of the U-Mamba architecture, to address challenges in both cardiac and vascular segmentation.
By incorporating Sharpness-Aware Minimization (SAM), the model enhances generalization by targeting flatter minima in the loss landscape.
We conduct new trials on the ImageCAS (coronary artery) and Aorta (aortic branches and zones) datasets, which present more complex segmentation challenges.
arXiv Detail & Related papers (2024-09-22T03:22:06Z) - Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.
We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.
Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning [16.23977055134524]
We propose a novel action predictor sequence, named Mamba Decision Maker (MambaDM)
MambaDM is expected to be a promising alternative for sequence modeling paradigms, owing to its efficient modeling of multi-scale dependencies.
This paper delves into the sequence modeling capabilities of MambaDM in the RL domain, paving the way for future advancements.
arXiv Detail & Related papers (2024-06-04T06:49:18Z) - Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation [37.79819260918366]
Continual Test-Time Adaptation (CTTA) aims to adapt the pre-trained model to ever-evolving target domains.
We explore the integration of a Mixture-of-Activation-Sparsity-Experts (MoASE) as an adapter for the CTTA task.
arXiv Detail & Related papers (2024-05-26T08:51:39Z) - ProMamba: Prompt-Mamba for polyp segmentation [12.008624337064521]
We propose a segmentation model based on Prompt-Mamba, which incorporates the latest Vision-Mamba and prompt technologies.
We are the first to apply the Vision-Mamba architecture to polyp segmentation and the first to utilize prompt technology in a polyp segmentation model.
Our model efficiently accomplishes segmentation tasks, surpassing previous state-of-the-art methods by an average of 5% across six datasets.
arXiv Detail & Related papers (2024-03-20T15:08:57Z) - Efficient and Scalable Fine-Tune of Language Models for Genome
Understanding [49.606093223945734]
We present textscLingo: textscLanguage prefix ftextscIne-tuning for textscGentextscOmes.
Unlike DNA foundation models, textscLingo strategically leverages natural language foundation models' contextual cues.
textscLingo further accommodates numerous downstream fine-tune tasks by an adaptive rank sampling method.
arXiv Detail & Related papers (2024-02-12T21:40:45Z) - Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [85.08169822181685]
This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks.
Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models.
arXiv Detail & Related papers (2024-02-05T18:58:11Z) - Heterogeneous Multi-Task Gaussian Cox Processes [61.67344039414193]
We present a novel extension of multi-task Gaussian Cox processes for modeling heterogeneous correlated tasks jointly.
A MOGP prior over the parameters of the dedicated likelihoods for classification, regression and point process tasks can facilitate sharing of information between heterogeneous tasks.
We derive a mean-field approximation to realize closed-form iterative updates for estimating model parameters.
arXiv Detail & Related papers (2023-08-29T15:01:01Z) - HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution [76.97231739317259]
We present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level.
On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data.
arXiv Detail & Related papers (2023-06-27T20:46:34Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.