Bipartite Graph Attention-based Clustering for Large-scale scRNA-seq Data
- URL: http://arxiv.org/abs/2602.07475v1
- Date: Sat, 07 Feb 2026 10:10:18 GMT
- Title: Bipartite Graph Attention-based Clustering for Large-scale scRNA-seq Data
- Authors: Zhuomin Liang, Liang Bai, Xian Yang,
- Abstract summary: Existing methods for scRNA-seq clustering, such as graph transformer-based models, treat each cell as a token in a sequence.<n>We propose a Bipartite Graph Transformer-based clustering model (BGFormer) for scRNA-seq data.<n> BGFormer achieves linear computational complexity with respect to the number of cells, making it scalable to large datasets.
- Score: 12.341331216251582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: scRNA-seq clustering is a critical task for analyzing single-cell RNA sequencing (scRNA-seq) data, as it groups cells with similar gene expression profiles. Transformers, as powerful foundational models, have been applied to scRNA-seq clustering. Their self-attention mechanism automatically assigns higher attention weights to cells within the same cluster, enhancing the distinction between clusters. Existing methods for scRNA-seq clustering, such as graph transformer-based models, treat each cell as a token in a sequence. Their computational and space complexities are $\mathcal{O}(n^2)$ with respect to the number of cells, limiting their applicability to large-scale scRNA-seq datasets.To address this challenge, we propose a Bipartite Graph Transformer-based clustering model (BGFormer) for scRNA-seq data. We introduce a set of learnable anchor tokens as shared reference points to represent the entire dataset. A bipartite graph attention mechanism is introduced to learn the similarity between cells and anchor tokens, bringing cells of the same class closer together in the embedding space. BGFormer achieves linear computational complexity with respect to the number of cells, making it scalable to large datasets. Experimental results on multiple large-scale scRNA-seq datasets demonstrate the effectiveness and scalability of BGFormer.
Related papers
- Scalable Context-Preserving Model-Aware Deep Clustering for Hyperspectral Images [51.95768218975529]
Subspace clustering has become widely adopted for the unsupervised analysis of hyperspectral images (HSIs)<n>Recent model-aware deep subspace clustering methods often use a two-stage framework, involving the calculation of a self-representation matrix with complexity of O(n2), followed by spectral clustering.<n>We propose a scalable, context-preserving deep clustering method based on basis representation, which jointly captures local and non-local structures for efficient HSI clustering.
arXiv Detail & Related papers (2025-06-12T16:43:09Z) - JojoSCL: Shrinkage Contrastive Learning for single-cell RNA sequence Clustering [0.44116499009420784]
Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular processes by enabling gene expression analysis at the individual cell level.<n>However, the high dimensionality and sparsity of scRNA-seq data continue to challenge existing clustering models.<n>We introduce JojoSCL, a novel self-supervised contrastive learning framework for scRNA-seq clustering.
arXiv Detail & Related papers (2025-05-31T05:59:56Z) - scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data [33.191442026962186]
Single-cell RNA sequencing (scRNA-seq) reveals cell heterogeneity.<n>Cell clustering plays a key role in identifying cell types and marker genes.<n> graph neural networks (GNNs)-based methods have significantly improved clustering performance.<n> scSiameseClu is a novel framework for interpreting single-cell RNA-seq data.
arXiv Detail & Related papers (2025-05-19T02:17:09Z) - scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data [5.234149080137045]
High sparsity and complex noise patterns inherent in scRNA-seq data present significant challenges for traditional clustering methods.
We propose a deep clustering method, Attention-Enhanced Structural Deep Embedding Graph Clustering (scASDC)
scASDC integrates multiple advanced modules to improve clustering accuracy and robustness.
arXiv Detail & Related papers (2024-08-09T09:10:36Z) - scRNA-seq Data Clustering by Cluster-aware Iterative Contrastive
Learning [29.199004624757233]
Single-cell RNA sequencing (scRNA-seq) enables researchers to analyze gene expression at single-cell level.
One important task in scRNA-seq data analysis is unsupervised clustering.
We propose Cluster-aware Iterative Contrastive Learning (CICL) for scRNA-seq data clustering.
arXiv Detail & Related papers (2023-12-27T14:50:59Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Single-cell Multi-view Clustering via Community Detection with Unknown
Number of Clusters [64.31109141089598]
We introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data.
scUNC seamlessly integrates information from different views without the need for a predefined number of clusters.
We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets.
arXiv Detail & Related papers (2023-11-28T08:34:58Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - Visualizing hierarchies in scRNA-seq data using a density tree-biased
autoencoder [50.591267188664666]
We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data.
We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
arXiv Detail & Related papers (2021-02-11T08:48:48Z) - CaEGCN: Cross-Attention Fusion based Enhanced Graph Convolutional
Network for Clustering [51.62959830761789]
We propose a cross-attention based deep clustering framework, named Cross-Attention Fusion based Enhanced Graph Convolutional Network (CaEGCN)
CaEGCN contains four main modules: cross-attention fusion, Content Auto-encoder, Graph Convolutional Auto-encoder and self-supervised model.
Experimental results on different types of datasets prove the superiority and robustness of the proposed CaEGCN.
arXiv Detail & Related papers (2021-01-18T05:21:59Z) - Cell Type Identification from Single-Cell Transcriptomic Data via
Semi-supervised Learning [2.4271601178529063]
Cell type identification from single-cell transcriptomic data is a common goal of single-cell RNA sequencing (scRNAseq) data analysis.
We propose a semi-supervised learning model to use unlabeled scRNAseq cells and limited amount of labeled scRNAseq cells to implement cell identification.
It is observed that the proposed model is able to achieve encouraging performance by learning on very limited amount of labeled scRNAseq cells.
arXiv Detail & Related papers (2020-05-06T19:15:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.