scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection
- URL: http://arxiv.org/abs/2506.20697v1
- Date: Wed, 25 Jun 2025 12:58:01 GMT
- Title: scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection
- Authors: Zhen Yuan, Shaoqing Jiao, Yihang Xiao, Jiajie Peng,
- Abstract summary: scMamba is a model designed to integrate single-cell multi-omics data without the need for prior feature selection.<n> scMamba distills rich biological insights from high-dimensional, sparse single-cell multi-omics data.<n>Our findings position scMamba as a powerful tool for large-scale single-cell multi-omics integration.
- Score: 5.139014238424409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of single-cell multi-omics technologies has enabled the simultaneous profiling of diverse omics layers within individual cells. Integrating such multimodal data provides unprecedented insights into cellular identity, regulatory processes, and disease mechanisms. However, it remains challenging, as current methods often rely on selecting highly variable genes or peaks during preprocessing, which may inadvertently discard crucial biological information. Here, we present scMamba, a foundation model designed to integrate single-cell multi-omics data without the need for prior feature selection while preserving genomic positional information. scMamba introduces a patch-based cell tokenization strategy that treats genomics regions as words (tokens) and cells as sentences. Building upon the concept of state space duality, scMamba distills rich biological insights from high-dimensional, sparse single-cell multi-omics data. Additionally, our novel contrastive learning approach, enhanced with cosine similarity regularization, enables superior alignment across omics layers compared to traditional methods. Systematic benchmarking across multiple datasets demonstrates that scMamba significantly outperforms state-of-the-art methods in preserving biological variation, aligning omics layers, and enhancing key downstream tasks such as clustering, cell type annotation, and trajectory inference. Our findings position scMamba as a powerful tool for large-scale single-cell multi-omics integration, capable of handling large-scale atlases and advancing biological discovery.
Related papers
- Masked Omics Modeling for Multimodal Representation Learning across Histopathology and Molecular Profiles [0.0]
Self-supervised learning has driven major advances in computational pathology.<n>However, histopathology alone often falls short for molecular characterization and understanding clinical outcomes.<n>We introduce MORPHEUS, a unified transformer-based pre-training framework that encodes both histopathology and multi-omics data into a shared latent space.
arXiv Detail & Related papers (2025-08-01T15:29:26Z) - Bidirectional Mamba for Single-Cell Data: Efficient Context Learning with Biological Fidelity [0.39945675027960637]
We introduce GeneMamba, a scalable and efficient foundation model for single-cell transcriptomics built on state space modeling.<n>GeneMamba captures bidirectional gene context with linear-time complexity, offering substantial computational gains over transformer baselines.<n>We evaluate GeneMamba across diverse tasks, including multi-batch integration, cell type annotation, and gene-gene correlation, demonstrating strong performance, interpretability, and robustness.
arXiv Detail & Related papers (2025-04-22T20:34:47Z) - Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Single-cell Multi-view Clustering via Community Detection with Unknown
Number of Clusters [64.31109141089598]
We introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data.
scUNC seamlessly integrates information from different views without the need for a predefined number of clusters.
We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets.
arXiv Detail & Related papers (2023-11-28T08:34:58Z) - Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL)
Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Modelling Technical and Biological Effects in scRNA-seq data with
Scalable GPLVMs [6.708052194104378]
We extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets.
The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast variational inference.
arXiv Detail & Related papers (2022-09-14T15:25:15Z) - scICML: Information-theoretic Co-clustering-based Multi-view Learning
for the Integrative Analysis of Single-cell Multi-omics data [0.0]
We develop a novel information-theoretic co-clustering-based multi-view learning (scICML) method for multi-omics single-cell data integration.
scICML utilizes co-clusterings to aggregate similar features for each view of data and uncover the common clustering pattern for cells.
Our experiments on four real-world datasets demonstrate that scICML improves the overall clustering performance and provides biological insights into the data analysis of peripheral blood mononuclear cells.
arXiv Detail & Related papers (2022-05-19T12:41:55Z) - Interpretable Single-Cell Set Classification with Kernel Mean Embeddings [14.686560033030101]
Kernel Mean Embedding encodes the cellular landscape of each profiled biological sample.
We train a simple linear classifier and achieve state-of-the-art classification accuracy on 3 flow and mass datasets.
arXiv Detail & Related papers (2022-01-18T21:40:36Z) - Contrastive Cycle Adversarial Autoencoders for Single-cell Multi-omics
Alignment and Integration [0.0]
We propose a novel framework to align and integrate single-cell RNA-seq data and single-cell ATAC-seq data.
Compared with the other state-of-the-art methods, our method performs better in both simulated and real single-cell data.
arXiv Detail & Related papers (2021-12-05T13:00:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.