CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis
- URL: http://arxiv.org/abs/2407.09811v1
- Date: Sat, 13 Jul 2024 09:14:50 GMT
- Title: CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis
- Authors: Yihang Xiao, Jinyi Liu, Yan Zheng, Xiaohan Xie, Jianye Hao, Mingzhi Li, Ruitao Wang, Fei Ni, Yuxiao Li, Jintian Luo, Shaoqing Jiao, Jiajie Peng,
- Abstract summary: Single-cell RNA sequencing (scRNA-seq) data analysis is crucial for biological research.
However, manual manipulation of various tools to achieve desired outcomes can be labor-intensive for researchers.
We introduce CellAgent, an LLM-driven multi-agent framework for the automatic processing and execution of scRNA-seq data analysis tasks.
- Score: 35.61361183175167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-cell RNA sequencing (scRNA-seq) data analysis is crucial for biological research, as it enables the precise characterization of cellular heterogeneity. However, manual manipulation of various tools to achieve desired outcomes can be labor-intensive for researchers. To address this, we introduce CellAgent (http://cell.agent4science.cn/), an LLM-driven multi-agent framework, specifically designed for the automatic processing and execution of scRNA-seq data analysis tasks, providing high-quality results with no human intervention. Firstly, to adapt general LLMs to the biological field, CellAgent constructs LLM-driven biological expert roles - planner, executor, and evaluator - each with specific responsibilities. Then, CellAgent introduces a hierarchical decision-making mechanism to coordinate these biological experts, effectively driving the planning and step-by-step execution of complex data analysis tasks. Furthermore, we propose a self-iterative optimization mechanism, enabling CellAgent to autonomously evaluate and optimize solutions, thereby guaranteeing output quality. We evaluate CellAgent on a comprehensive benchmark dataset encompassing dozens of tissues and hundreds of distinct cell types. Evaluation results consistently show that CellAgent effectively identifies the most suitable tools and hyperparameters for single-cell analysis tasks, achieving optimal performance. This automated framework dramatically reduces the workload for science data analyses, bringing us into the "Agent for Science" era.
Related papers
- CellForge: Agentic Design of Virtual Cell Models [24.938939602572702]
We introduce CellForge, an agentic system that transforms presented biological datasets into optimized computational models for virtual cells.<n>The framework integrates three core modules: Task Analysis, Method Design, and Experiment Execution.<n>We demonstrate CellForge's capabilities in single-cell perturbation prediction, using six diverse datasets.
arXiv Detail & Related papers (2025-08-04T10:43:31Z) - BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments [8.317138109309967]
Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation.<n>Here we introduce BioMARS, an intelligent platform that integrates LLMs, VLMs, and modular robotics to autonomously design, plan, and execute biological experiments.<n>A web interface enables real-time human-AI collaboration, while a modular backend allows scalable integration with laboratory hardware.
arXiv Detail & Related papers (2025-07-02T08:47:02Z) - Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data [33.7054351451505]
We introduce Agentomics-ML, a fully autonomous agent-based system designed to produce a classification model.<n>We show that Agentomics-ML outperforms existing state-of-the-art agent-based methods in both generalization and success rates.
arXiv Detail & Related papers (2025-06-05T19:44:38Z) - DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery [54.79763887844838]
Large language models (LLMs) integrated with autonomous agents hold significant potential for advancing scientific discovery through automated reasoning and task execution.<n>We introduce DrugPilot, a LLM-based agent system with a parameterized reasoning architecture designed for end-to-end scientific in drug discovery.<n>DrugPilot significantly outperforms state-of-the-art agents such as ReAct and LoT, achieving task completion rates of 98.0%, 93.5%, and 64.0% for simple, multi-tool, and multi-turn scenarios, respectively.
arXiv Detail & Related papers (2025-05-20T05:18:15Z) - CellVerse: Do Large Language Models Really Understand Cell Biology? [74.34984441715517]
We introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data.<n>We systematically evaluate the performance across 14 open-source and closed-source LLMs ranging from 160M to 671B on CellVerse.
arXiv Detail & Related papers (2025-05-09T06:47:23Z) - ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies [16.90884865239373]
We introduce ResearchCodeAgent, a novel multi-agent system to automate the codification of research methodologies.
The system bridges the gap between high-level research concepts and their practical implementation.
ResearchCodeAgent represents a significant step towards the research implementation process, potentially accelerating the pace of machine learning research.
arXiv Detail & Related papers (2025-04-28T07:18:45Z) - DatawiseAgent: A Notebook-Centric LLM Agent Framework for Automated Data Science [4.1431677219677185]
DatawiseAgent is a notebook-centric agent framework that unifies interactions among user, agent and the computational environment.
It orchestrates four stages, including DSF-like planning, incremental execution, self-ging, and post-filtering.
It consistently outperforms or matches state-of-the-art methods across multiple model settings.
arXiv Detail & Related papers (2025-03-10T08:32:33Z) - Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data [13.56585855722118]
Large language models (LLMs) have demonstrated their ability to efficiently process and synthesize vast corpora of text to automatically extract biological knowledge.
Our study explores the potential of LLMs to accurately classify and annotate cell types in single-cell RNA sequencing (scRNA-seq) data.
The results demonstrate that LLMs can provide robust interpretations of single-cell data without requiring additional fine-tuning.
arXiv Detail & Related papers (2024-12-03T23:58:35Z) - Automating Exploratory Proteomics Research via Language Models [22.302672656499315]
PROTEUS is a fully automated system for scientific discovery from raw data.
It produces a comprehensive set of research objectives, analysis results and novel biological hypotheses without human intervention.
arXiv Detail & Related papers (2024-11-06T08:16:56Z) - AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.
Recent works have started exploiting large language models (LLM) to lessen such burden.
This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z) - GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians [13.837406082703756]
We introduce GenoTEX, a benchmark dataset for the automatic exploration of gene expression data.
GenoTEX provides annotated code and results for solving a wide range of gene identification problems.
We present GenoAgents, a team of LLM-based agents designed with context-aware planning, iterative correction, and domain expert consultation.
arXiv Detail & Related papers (2024-06-21T17:55:24Z) - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.
BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.
It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [86.61052121715689]
MatPlotAgent is a model-agnostic framework designed to automate scientific data visualization tasks.
MatPlotBench is a high-quality benchmark consisting of 100 human-verified test cases.
arXiv Detail & Related papers (2024-02-18T04:28:28Z) - Large Language Model Agent for Hyper-Parameter Optimization [30.560250427498243]
We introduce a novel paradigm leveraging Large Language Models (LLMs) to automate hyperparameter optimization across diverse machine learning tasks.
AgentHPO processes the task information autonomously, conducts experiments with specific hyper parameters, and iteratively optimize them.
This human-like optimization process largely reduces the number of required trials, simplifies the setup process, and enhances interpretability and user trust.
arXiv Detail & Related papers (2024-02-02T20:12:05Z) - Automated Bioinformatics Analysis via AutoBA [33.09743154722675]
Auto Bioinformatics Analysis (AutoBA) is an autonomous AI agent based on a large language model designed explicitly for conventional omics data analysis.
AutoBA's robustness and adaptability are affirmed across a diverse range of omics analysis cases, including whole genome sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, and spatial transcriptomics.
arXiv Detail & Related papers (2023-09-06T07:54:45Z) - Multi-modal Self-supervised Pre-training for Regulatory Genome Across
Cell Types [75.65676405302105]
We propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT.
We pre-train our model on the ATAC-seq dataset with 17 million genome sequences.
arXiv Detail & Related papers (2021-10-11T12:48:44Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.