GRIT: Graph-Regularized Logit Refinement for Zero-shot Cell Type Annotation
- URL: http://arxiv.org/abs/2508.04747v1
- Date: Wed, 06 Aug 2025 07:09:46 GMT
- Title: GRIT: Graph-Regularized Logit Refinement for Zero-shot Cell Type Annotation
- Authors: Tianxiang Hu, Chenyi Zhou, Jiaxiang Liu, Jiongxin Wang, Ruizhe Chen, Haoxiang Xia, Gaoang Wang, Jian Wu, Zuozhu Liu,
- Abstract summary: Cell type annotation is a fundamental step in the analysis of single-cell RNA sequencing (scRNA-seq) data.<n>Recent advances in CLIP-style models offer a promising path toward automating cell type annotation.<n>In this paper, we propose to refine the zero-shot logits produced by LangCell through a graph-regularized optimization framework.
- Score: 15.465706196179676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cell type annotation is a fundamental step in the analysis of single-cell RNA sequencing (scRNA-seq) data. In practice, human experts often rely on the structure revealed by principal component analysis (PCA) followed by $k$-nearest neighbor ($k$-NN) graph construction to guide annotation. While effective, this process is labor-intensive and does not scale to large datasets. Recent advances in CLIP-style models offer a promising path toward automating cell type annotation. By aligning scRNA-seq profiles with natural language descriptions, models like LangCell enable zero-shot annotation. While LangCell demonstrates decent zero-shot performance, its predictions remain suboptimal, particularly in achieving consistent accuracy across all cell types. In this paper, we propose to refine the zero-shot logits produced by LangCell through a graph-regularized optimization framework. By enforcing local consistency over the task-specific PCA-based k-NN graph, our method combines the scalability of the pre-trained models with the structural robustness relied upon in expert annotation. We evaluate our approach on 14 annotated human scRNA-seq datasets from 4 distinct studies, spanning 11 organs and over 200,000 single cells. Our method consistently improves zero-shot annotation accuracy, achieving accuracy gains of up to 10%. Further analysis showcase the mechanism by which GRIT effectively propagates correct signals through the graph, pulling back mislabeled cells toward more accurate predictions. The method is training-free, model-agnostic, and serves as a simple yet effective plug-in for enhancing automated cell type annotation in practice.
Related papers
- Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning [44.91329557101423]
We introduce the CellPuzzles task, where the objective is to assign unique cell types to a batch of cells.<n>This benchmark spans diverse tissues, diseases, and donor conditions, and requires reasoning across the batch-level cellular context to ensure label uniqueness.<n>We propose Cell-o1, a 7B LLM trained via supervised fine-tuning on distilled reasoning traces, followed by reinforcement learning with batch-level rewards.
arXiv Detail & Related papers (2025-06-03T14:16:53Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - ReCellTy: Domain-specific knowledge graph retrieval-augmented LLMs workflow for single-cell annotation [8.31906400360507]
We developed a graph structured feature marker database to retrieve entities linked to differential genes for cell reconstruction.<n>Our method improves human evaluation scores by up to 0.21 and semantic similarity by 6.1% across 11 tissue types.
arXiv Detail & Related papers (2025-04-24T01:05:22Z) - Benchmarking and optimizing organism wide single-cell RNA alignment methods [0.0]
We introduce the K-Neighbors Intersection (KNI) score, a single score that both penalizes batch effects and measures accuracy at cross-dataset cell-type label prediction.<n>We introduce Batch Adversarial single-cell Variational Inference (BA-scVI), as a new variant of scVI that uses adversarial training to penalize batch-effects in the encoder and decoder.<n>In the resulting aligned space, we find that the granularity of cell-type groupings is conserved, supporting the notion that whole-organism cell-type maps can be created by a single model without loss of information
arXiv Detail & Related papers (2025-03-26T17:11:47Z) - Less is More: One-shot Subgraph Reasoning on Large-scale Knowledge Graphs [49.547988001231424]
We propose the one-shot-subgraph link prediction to achieve efficient and adaptive prediction.<n>Design principle is that, instead of directly acting on the whole KG, the prediction procedure is decoupled into two steps.<n>We achieve promoted efficiency and leading performances on five large-scale benchmarks.
arXiv Detail & Related papers (2024-03-15T12:00:12Z) - Cell Graph Transformer for Nuclei Classification [78.47566396839628]
We develop a cell graph transformer (CGT) that treats nodes and edges as input tokens to enable learnable adjacency and information exchange among all nodes.
Poorly features can lead to noisy self-attention scores and inferior convergence.
We propose a novel topology-aware pretraining method that leverages a graph convolutional network (GCN) to learn a feature extractor.
arXiv Detail & Related papers (2024-02-20T12:01:30Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - DeepGATGO: A Hierarchical Pretraining-Based Graph-Attention Model for
Automatic Protein Function Prediction [4.608328575930055]
Automatic protein function prediction (AFP) is classified as a large-scale multi-label classification problem.
Currently, popular methods primarily combine protein-related information and Gene Ontology (GO) terms to generate final functional predictions.
We propose a sequence-based hierarchical prediction method, DeepGATGO, which processes protein sequences and GO term labels hierarchically.
arXiv Detail & Related papers (2023-07-24T07:01:32Z) - A systematic evaluation of methods for cell phenotype classification
using single-cell RNA sequencing data [7.62849213621469]
This study evaluates 13 popular supervised machine learning algorithms to classify cell phenotypes.
The study outcomes showed that ElasticNet with interactions performed best in small and medium data sets.
arXiv Detail & Related papers (2021-10-01T23:24:15Z) - Combining Label Propagation and Simple Models Out-performs Graph Neural
Networks [52.121819834353865]
We show that for many standard transductive node classification benchmarks, we can exceed or match the performance of state-of-the-art GNNs.
We call this overall procedure Correct and Smooth (C&S)
Our approach exceeds or nearly matches the performance of state-of-the-art GNNs on a wide variety of benchmarks.
arXiv Detail & Related papers (2020-10-27T02:10:52Z) - Split and Expand: An inference-time improvement for Weakly Supervised
Cell Instance Segmentation [71.50526869670716]
We propose a two-step post-processing procedure, Split and Expand, to improve the conversion of segmentation maps to instances.
In the Split step, we split clumps of cells from the segmentation map into individual cell instances with the guidance of cell-center predictions.
In the Expand step, we find missing small cells using the cell-center predictions.
arXiv Detail & Related papers (2020-07-21T14:05:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.