CellMaster: Collaborative Cell Type Annotation in Single-Cell Analysis
- URL: http://arxiv.org/abs/2602.13346v1
- Date: Thu, 12 Feb 2026 20:20:22 GMT
- Title: CellMaster: Collaborative Cell Type Annotation in Single-Cell Analysis
- Authors: Zhen Wang, Yiming Gao, Jieyuan Liu, Enze Ma, Jefferson Chen, Mark Antkowiak, Mengzhou Hu, JungHo Kong, Dexter Pratt, Zhiting Hu, Wei Wang, Trey Ideker, Eric P. Xing,
- Abstract summary: We present CellMaster, an AI agent that mimics expert practice for zero-shot cell-type annotation.<n>Across 9 datasets spanning 8 tissues, CellMaster improved accuracy by 7.1% over best-performing baselines in automatic mode.<n>The system demonstrates particular strength in rare and novel cell states where baselines often fail.
- Score: 35.57672494910454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Single-cell RNA-seq (scRNA-seq) enables atlas-scale profiling of complex tissues, revealing rare lineages and transient states. Yet, assigning biologically valid cell identities remains a bottleneck because markers are tissue- and state-dependent, and novel states lack references. We present CellMaster, an AI agent that mimics expert practice for zero-shot cell-type annotation. Unlike existing automated tools, CellMaster leverages LLM-encoded knowledge (e.g., GPT-4o) to perform on-the-fly annotation with interpretable rationales, without pre-training or fixed marker databases. Across 9 datasets spanning 8 tissues, CellMaster improved accuracy by 7.1% over best-performing baselines (including CellTypist and scTab) in automatic mode. With human-in-the-loop refinement, this advantage increased to 18.6%, with a 22.1% gain on subtype populations. The system demonstrates particular strength in rare and novel cell states where baselines often fail. Source code and the web application are available at \href{https://github.com/AnonymousGym/CellMaster}{https://github.com/AnonymousGym/CellMaster}.
Related papers
- GRIT: Graph-Regularized Logit Refinement for Zero-shot Cell Type Annotation [15.465706196179676]
Cell type annotation is a fundamental step in the analysis of single-cell RNA sequencing (scRNA-seq) data.<n>Recent advances in CLIP-style models offer a promising path toward automating cell type annotation.<n>In this paper, we propose to refine the zero-shot logits produced by LangCell through a graph-regularized optimization framework.
arXiv Detail & Related papers (2025-08-06T07:09:46Z) - Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning [44.91329557101423]
We introduce the CellPuzzles task, where the objective is to assign unique cell types to a batch of cells.<n>This benchmark spans diverse tissues, diseases, and donor conditions, and requires reasoning across the batch-level cellular context to ensure label uniqueness.<n>We propose Cell-o1, a 7B LLM trained via supervised fine-tuning on distilled reasoning traces, followed by reinforcement learning with batch-level rewards.
arXiv Detail & Related papers (2025-06-03T14:16:53Z) - CellVerse: Do Large Language Models Really Understand Cell Biology? [74.34984441715517]
We introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data.<n>We systematically evaluate the performance across 14 open-source and closed-source LLMs ranging from 160M to 671B on CellVerse.
arXiv Detail & Related papers (2025-05-09T06:47:23Z) - Real-Time Cell Sorting with Scalable In Situ FPGA-Accelerated Deep Learning [2.0717688648414065]
We present a label-free machine learning framework for cell classification using bright-field microscopy images.<n>Our framework accurately classifies T4, T8, and B cell types with a dataset of 80,000 preprocessed images.<n>Our FPGA-accelerated student model achieves an ultra-low latency of just 14.5$mu$s and a complete cell detection-to-sorting trigger time of 24.7$mu$s.
arXiv Detail & Related papers (2025-03-16T19:32:11Z) - CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models [1.7674154313605157]
$textCellViTscriptstyle ++$ is a framework for generalized cell segmentation in digital pathology.<n>$textCellViTscriptstyle ++$ is an open-source framework featuring a user-friendly, web-based interface for visualization and annotation.
arXiv Detail & Related papers (2025-01-09T14:26:50Z) - Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data [13.56585855722118]
Large language models (LLMs) have demonstrated their ability to efficiently process and synthesize vast corpora of text to automatically extract biological knowledge.<n>Our study explores the potential of LLMs to accurately classify and annotate cell types in single-cell RNA sequencing (scRNA-seq) data.<n>The results demonstrate that LLMs can provide robust interpretations of single-cell data without requiring additional fine-tuning.
arXiv Detail & Related papers (2024-12-03T23:58:35Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence
Learning [75.61681328968714]
We propose recurrent independent Grid LSTM (RigLSTM) to exploit the underlying modular structure of the target task.
Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability.
arXiv Detail & Related papers (2023-11-03T07:40:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.