scAgent: Universal Single-Cell Annotation via a LLM Agent
- URL: http://arxiv.org/abs/2504.04698v1
- Date: Mon, 07 Apr 2025 03:03:21 GMT
- Title: scAgent: Universal Single-Cell Annotation via a LLM Agent
- Authors: Yuren Mao, Yu Mi, Peigen Liu, Mengfei Zhang, Hanqing Liu, Yunjun Gao,
- Abstract summary: scAgent is a universal cell annotation framework based on Large Language Models (LLMs)<n> scAgent can identify cell types and discover novel cell types in diverse tissues.<n> Experimental studies in 160 cell types and 35 tissues demonstrate the superior performance of scAgent.
- Score: 21.559055427500642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cell type annotation is critical for understanding cellular heterogeneity. Based on single-cell RNA-seq data and deep learning models, good progress has been made in annotating a fixed number of cell types within a specific tissue. However, universal cell annotation, which can generalize across tissues, discover novel cell types, and extend to novel cell types, remains less explored. To fill this gap, this paper proposes scAgent, a universal cell annotation framework based on Large Language Models (LLMs). scAgent can identify cell types and discover novel cell types in diverse tissues; furthermore, it is data efficient to learn novel cell types. Experimental studies in 160 cell types and 35 tissues demonstrate the superior performance of scAgent in general cell-type annotation, novel cell discovery, and extensibility to novel cell type.
Related papers
- Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data [13.56585855722118]
Large language models (LLMs) have demonstrated their ability to efficiently process and synthesize vast corpora of text to automatically extract biological knowledge.<n>Our study explores the potential of LLMs to accurately classify and annotate cell types in single-cell RNA sequencing (scRNA-seq) data.<n>The results demonstrate that LLMs can provide robust interpretations of single-cell data without requiring additional fine-tuning.
arXiv Detail & Related papers (2024-12-03T23:58:35Z) - Cell-ontology guided transcriptome foundation model [18.51941953027685]
We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry.<n>Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks.
arXiv Detail & Related papers (2024-08-22T13:15:49Z) - Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - Single-cell Multi-view Clustering via Community Detection with Unknown
Number of Clusters [64.31109141089598]
We introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data.
scUNC seamlessly integrates information from different views without the need for a predefined number of clusters.
We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets.
arXiv Detail & Related papers (2023-11-28T08:34:58Z) - Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL)
Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z) - Revolutionizing Single Cell Analysis: The Power of Large Language Models
for Cell Type Annotation [0.0]
Large language models such as ChatGPT and New Bing provide accurate annotations of cell types.
By using ChatGPT to annotate single cell data, we can relate rare cell type to their function.
This can have important applications in understanding cancer progression, mammalian development, and stem cell differentiation.
arXiv Detail & Related papers (2023-04-05T18:45:54Z) - Topology-Guided Multi-Class Cell Context Generation for Digital
Pathology [28.43244574309888]
We introduce several mathematical tools from spatial statistics and topological data analysis.
We generate high quality multi-class cell layouts for the first time.
We show that the topology-rich cell layouts can be used for data augmentation and improve the performance of downstream tasks such as cell classification.
arXiv Detail & Related papers (2023-04-05T07:01:34Z) - A biology-driven deep generative model for cell-type annotation in
cytometry [0.0]
We introduce Scyan, a Single-cell Cytometry Network that automatically annotates cell types using only prior expert knowledge.
Scyan significantly outperforms the related state-of-the-art models on multiple public datasets while being faster and interpretable.
In addition, Scyan overcomes several complementary tasks such as batch-effect removal, debarcoding, and population discovery.
arXiv Detail & Related papers (2022-08-11T10:50:44Z) - Stretchable Cells Help DARTS Search Better [70.52254306274092]
Differentiable neural architecture search (DARTS) has gained much success in discovering flexible and diverse cell types.
Current DARTS methods are prone to wide and shallow cells, and this topology collapse induces sub-optimal searched cells.
In this paper, we endowing the cells with explicit stretchability, so the search can be directly implemented on our stretchable cells.
arXiv Detail & Related papers (2020-11-18T14:15:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.