Related papers: scBench: Evaluating AI Agents on Single-Cell RNA-seq Analysis

scBench: Evaluating AI Agents on Single-Cell RNA-seq Analysis

URL: http://arxiv.org/abs/2602.09063v1
Date: Mon, 09 Feb 2026 03:20:31 GMT
Title: scBench: Evaluating AI Agents on Single-Cell RNA-seq Analysis
Authors: Kenny Workman, Zhen Yang, Harihara Muralidharan, Aidan Abdulali, Hannah Le,
Abstract summary: scBench is a benchmark of 394 verifiable problems derived from scRNA-seq datasets.<n> Benchmark data on eight frontier models shows that accuracy ranges from 29-53%, with strong model-task and model-platform interactions.
Score: 6.518767416778027
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As single-cell RNA sequencing datasets grow in adoption, scale, and complexity, data analysis remains a bottleneck for many research groups. Although frontier AI agents have improved dramatically at software engineering and general data analysis, it remains unclear whether they can extract biological insight from messy, real-world single-cell datasets. We introduce scBench, a benchmark of 394 verifiable problems derived from practical scRNA-seq workflows spanning six sequencing platforms and seven task categories. Each problem provides a snapshot of experimental data immediately prior to an analysis step and a deterministic grader that evaluates recovery of a key biological result. Benchmark data on eight frontier models shows that accuracy ranges from 29-53%, with strong model-task and model-platform interactions. Platform choice affects accuracy as much as model choice, with 40+ percentage point drops on less-documented technologies. scBench complements SpatialBench to cover the two dominant single-cell modalities, serving both as a measurement tool and a diagnostic lens for developing agents that can analyze real scRNA-seq datasets faithfully and reproducibly.

Related papers

SpatialBench: Can Agents Analyze Real-World Spatial Biology Data? [6.993633248897315]
We introduce SpatialBench, a benchmark of 146 verifiable problems derived from practical spatial analysis.<n>Each problem provides a snapshot of experimental data immediately prior to an analysis step.<n>Base model accuracy remains low, with strong model-task and model-platform interactions.
arXiv Detail & Related papers (2025-12-26T07:40:11Z)
AutoMalDesc: Large-Scale Script Analysis for Cyber Threat Research [81.04845910798387]
Generating natural language explanations for threat detections remains an open problem in cybersecurity research.<n>We present AutoMalDesc, an automated static analysis summarization framework that operates independently at scale.<n>We publish our complete dataset of more than 100K script samples, including annotated seed (0.9K) datasets, along with our methodology and evaluation framework.
arXiv Detail & Related papers (2025-11-17T13:05:25Z)
scUnified: An AI-Ready Standardized Resource for Single-Cell RNA Sequencing Analysis [23.973638982075016]
We present scUnified, an AI-ready standardized resource for single-cell RNA sequencing data.<n> scUnified consolidates 13 high-quality datasets spanning two species and nine tissue types.
arXiv Detail & Related papers (2025-09-30T07:23:01Z)
CellPainTR: Generalizable Representation Learning for Cross-Dataset Cell Painting Analysis [51.56484100374058]
We introduce CellPainTR, a Transformer-based architecture designed to learn foundational representations of cellular morphology.<n>Our work represents a significant step towards creating truly foundational models for image-based profiling, enabling more reliable and scalable cross-study biological analysis.
arXiv Detail & Related papers (2025-09-02T03:30:07Z)
Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios [65.97836905826145]
scarcity of data in various scenarios, such as medical, industry and autonomous driving, leads to model overfitting and dataset imbalance.<n>We propose Crucial-Diff, a domain-agnostic framework designed to synthesize crucial samples.<n>Our framework generates diverse, high-quality training data, achieving a pixel-level AP of 83.63% and an F1-MAX of 78.12% on MVTec.
arXiv Detail & Related papers (2025-07-14T04:41:38Z)
DeepSeq: High-Throughput Single-Cell RNA Sequencing Data Labeling via Web Search-Augmented Agentic Generative AI Foundation Models [0.0]
Generative AI foundation models offer transformative potential for processing structured biological data.<n>We propose the use of agentic foundation models with real-time web search to automate the labeling of experimental data, achieving up to 82.5% accuracy.
arXiv Detail & Related papers (2025-06-14T23:30:22Z)
DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery [54.79763887844838]
Large language models (LLMs) integrated with autonomous agents hold significant potential for advancing scientific discovery through automated reasoning and task execution.<n>We introduce DrugPilot, a LLM-based agent system with a parameterized reasoning architecture designed for end-to-end scientific in drug discovery.<n>DrugPilot significantly outperforms state-of-the-art agents such as ReAct and LoT, achieving task completion rates of 98.0%, 93.5%, and 64.0% for simple, multi-tool, and multi-turn scenarios, respectively.
arXiv Detail & Related papers (2025-05-20T05:18:15Z)
Advancing Tabular Stroke Modelling Through a Novel Hybrid Architecture and Feature-Selection Synergy [0.9999629695552196]
The present work develops and validates a data-driven and interpretable machine-learning framework designed to predict strokes.<n>Ten routinely gathered demographic, lifestyle, and clinical variables were sourced from a public cohort of 4,981 records.<n>The proposed model achieved an accuracy rate of 97.2% and an F1-score of 97.15%, indicating a significant enhancement compared to the leading individual model.
arXiv Detail & Related papers (2025-05-18T21:46:45Z)
CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis [35.61361183175167]
Single-cell RNA sequencing (scRNA-seq) data analysis is crucial for biological research. However, manual manipulation of various tools to achieve desired outcomes can be labor-intensive for researchers. We introduce CellAgent, an LLM-driven multi-agent framework for the automatic processing and execution of scRNA-seq data analysis tasks.
arXiv Detail & Related papers (2024-07-13T09:14:50Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
Towards an Automatic Analysis of CHO-K1 Suspension Growth in Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data. Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.