Refinement Contrastive Learning of Cell-Gene Associations for Unsupervised Cell Type Identification
- URL: http://arxiv.org/abs/2512.10640v1
- Date: Thu, 11 Dec 2025 13:45:31 GMT
- Title: Refinement Contrastive Learning of Cell-Gene Associations for Unsupervised Cell Type Identification
- Authors: Liang Peng, Haopeng Liu, Yixuan Ye, Cheng Liu, Wenjun Shen, Si Wu, Hau-San Wong,
- Abstract summary: Unsupervised cell type identification is crucial for uncovering and characterizing heterogeneous populations in single cell omics studies.<n>We propose a Refinement Contrastive Learning framework (scRCL) that explicitly incorporates cell-gene interactions to derive more informative representations.<n>Our method consistently outperforms state-of-the-art baselines in cell-type identification accuracy.
- Score: 37.569728273621315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised cell type identification is crucial for uncovering and characterizing heterogeneous populations in single cell omics studies. Although a range of clustering methods have been developed, most focus exclusively on intrinsic cellular structure and ignore the pivotal role of cell-gene associations, which limits their ability to distinguish closely related cell types. To this end, we propose a Refinement Contrastive Learning framework (scRCL) that explicitly incorporates cell-gene interactions to derive more informative representations. Specifically, we introduce two contrastive distribution alignment components that reveal reliable intrinsic cellular structures by effectively exploiting cell-cell structural relationships. Additionally, we develop a refinement module that integrates gene-correlation structure learning to enhance cell embeddings by capturing underlying cell-gene associations. This module strengthens connections between cells and their associated genes, refining the representation learning to exploiting biologically meaningful relationships. Extensive experiments on several single-cell RNA-seq and spatial transcriptomics benchmark datasets demonstrate that our method consistently outperforms state-of-the-art baselines in cell-type identification accuracy. Moreover, downstream biological analyses confirm that the recovered cell populations exhibit coherent gene-expression signatures, further validating the biological relevance of our approach. The code is available at https://github.com/THPengL/scRCL.
Related papers
- Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling [74.25438319700929]
We propose CHMR (Cell-aware Hierarchical Multi-modal Representations), a robust framework that models local-global dependencies between molecules and cellular responses.<n> evaluated on nine public benchmarks spanning 728 tasks, CHMR outperforms state-of-the-art baselines.<n>Results demonstrate the advantage of hierarchy-aware, multimodal learning for reliable and biologically grounded molecular representations.
arXiv Detail & Related papers (2025-11-26T07:15:00Z) - Cell2Text: Multimodal LLM for Generating Single-Cell Descriptions from RNA-Seq Data [17.440176654185095]
We introduce Cell2Text, a framework that translates scRNA-seq profiles into structured natural language descriptions.<n>By integrating gene-level embeddings with pretrained large language models, Cell2Text generates coherent summaries that capture cellular identity, tissue origin, disease associations, and pathway activity.
arXiv Detail & Related papers (2025-09-29T14:20:50Z) - OCELOT 2023: Cell Detection from Cell-Tissue Interaction Challenge [18.567918724777517]
OCELOT 2023 challenge was initiated to gather insights from the community to validate the hypothesis that understanding cell and tissue (cell-tissue) interactions is crucial for achieving human-level performance.<n>Participants presented models that significantly enhanced the understanding of cell-tissue relationships.<n>This paper provides a comparative analysis of the methods used by participants, highlighting innovative strategies implemented in the OCELOT 2023 challenge.
arXiv Detail & Related papers (2025-09-11T05:21:02Z) - Enhanced Single-Cell RNA-seq Embedding through Gene Expression and Data-Driven Gene-Gene Interaction Integration [0.05156484100374057]
We present a novel embedding approach that integrates both gene expression profiles and data-driven gene-gene interactions.<n>By incorporating both expression levels and gene-gene interactions, our approach provides a more comprehensive representation of cellular states.
arXiv Detail & Related papers (2025-09-01T21:19:27Z) - Clustering with Communication: A Variational Framework for Single Cell Representation Learning [2.275097126764287]
We propose CCCVAE, a variational autoencoder framework that incorporates CCC signals into single-cell representation learning.<n>We show that CCCVAE improves clustering performance, achieving higher evaluation scores than standard VAE baselines.
arXiv Detail & Related papers (2025-05-08T01:53:36Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL)
Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z) - Topological Data Analysis in Time Series: Temporal Filtration and
Application to Single-Cell Genomics [13.173307471333619]
We propose the single-cell topological simplicial analysis (scTSA)
Applying this approach to the single-cell gene expression profiles from local networks of cells reveals a previously unseen topology of cellular ecology.
Benchmarked on the single-cell RNA-seq data of zebrafish embryogenesis spanning 38,731 cells, 25 cell types and 12 time steps, our approach highlights the gastrulation as the most critical stage.
arXiv Detail & Related papers (2022-04-29T12:46:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.