Related papers: SpatialBench: Can Agents Analyze Real-World Spatial Biology Data?

SpatialBench: Can Agents Analyze Real-World Spatial Biology Data?

URL: http://arxiv.org/abs/2512.21907v1
Date: Fri, 26 Dec 2025 07:40:11 GMT
Title: SpatialBench: Can Agents Analyze Real-World Spatial Biology Data?
Authors: Kenny Workman, Zhen Yang, Harihara Muralidharan, Hannah Le,
Abstract summary: We introduce SpatialBench, a benchmark of 146 verifiable problems derived from practical spatial analysis.<n>Each problem provides a snapshot of experimental data immediately prior to an analysis step.<n>Base model accuracy remains low, with strong model-task and model-platform interactions.
Score: 6.993633248897315
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatial transcriptomics assays are rapidly increasing in scale and complexity, making computational analysis a major bottleneck in biological discovery. Although frontier AI agents have improved dramatically at software engineering and general data analysis, it remains unclear whether they can extract biological insight from messy, real-world spatial datasets. We introduce SpatialBench, a benchmark of 146 verifiable problems derived from practical spatial analysis workflows spanning five spatial technologies and seven task categories. Each problem provides a snapshot of experimental data immediately prior to an analysis step and a deterministic grader that evaluates recovery of a key biological result. Benchmark data on frontier models shows that base model accuracy remains low (20-38% across model families), with strong model-task and model-platform interactions. Harness design has a large empirical effect on performance, indicating that tools, prompts, control flow, and execution environment should be evaluated and improved as first-class objects. SpatialBench serves both as a measurement tool and a diagnostic lens for developing agents that can interact with real spatial datasets faithfully, transparently, and reproducibly.

Related papers

scBench: Evaluating AI Agents on Single-Cell RNA-seq Analysis [6.518767416778027]
scBench is a benchmark of 394 verifiable problems derived from scRNA-seq datasets.<n> Benchmark data on eight frontier models shows that accuracy ranges from 29-53%, with strong model-task and model-platform interactions.
arXiv Detail & Related papers (2026-02-09T03:20:31Z)
BiomechAgent: AI-Assisted Biomechanical Analysis Through Code-Generating Agents [1.1458853556386797]
We present BiomechAgent, a code-generating AI agent that enables biomechanical analysis through natural language.<n>We developed a benchmark spanning data retrieval, visualization, activity classification, temporal segmentation, and clinical reasoning.<n>Biomechanically-informed, domain-specific instructions significantly improved performance over generic prompts.
arXiv Detail & Related papers (2026-01-16T04:30:04Z)
Real-Time Health Analytics Using Ontology-Driven Complex Event Processing and LLM Reasoning: A Tuberculosis Case Study [4.0954316720608634]
This study presents an ontology-enabled real-time analytics framework that integrates Complex Event Processing (CEP) and Large Language Models (LLMs)<n>Patient data is ingested and processed using Apache Kafka and Spark Streaming, where CEP engines detect clinically significant event patterns.<n>The framework is evaluated using a dataset of 1,000 Tuberculosis (TB) patients as a use case, demonstrating low-latency event detection, scalable reasoning, and high model performance.
arXiv Detail & Related papers (2025-10-05T14:21:46Z)
CellPainTR: Generalizable Representation Learning for Cross-Dataset Cell Painting Analysis [51.56484100374058]
We introduce CellPainTR, a Transformer-based architecture designed to learn foundational representations of cellular morphology.<n>Our work represents a significant step towards creating truly foundational models for image-based profiling, enabling more reliable and scalable cross-study biological analysis.
arXiv Detail & Related papers (2025-09-02T03:30:07Z)
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers [251.23085679210206]
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research.<n>This survey reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate.<n>We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge.
arXiv Detail & Related papers (2025-08-28T18:30:52Z)
Hyperspectral Imaging [49.45523645429475]
Hyperspectral imaging (HSI) is an advanced sensing modality that simultaneously captures spatial and spectral information.<n>This Primer presents a comprehensive overview of HSI, from the underlying physical principles and sensor architectures to key steps in data acquisition, calibration, and correction.
arXiv Detail & Related papers (2025-08-11T15:47:24Z)
Valid Inference with Imperfect Synthetic Data [39.10587411316875]
We introduce a new estimator based on generalized method of moments.<n>We find that interactions between the moment residuals of synthetic data and those of real data can greatly improve estimates of the target parameter.
arXiv Detail & Related papers (2025-08-08T18:32:52Z)
Towards Interpretable and Efficient Feature Selection in Trajectory Datasets: A Taxonomic Approach [0.0]
Trajectory analysis is of paramount importance in understanding the pattern in which an object moves through space and time, as well as in predicting its next move.<n>Due to the significant interest in the area, data collection has improved substantially, resulting in a large number of features becoming available for training and predicting models.<n>This introduces a high-dimensionality-induced feature explosion problem, which reduces the efficiency and interpretability of the data, thereby reducing the accuracy of machine learning models.
arXiv Detail & Related papers (2025-06-25T12:21:20Z)
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology [4.099098082010236]
Large Language Models (LLMs) and LLM-based agents show great promise in accelerating scientific research.<n>We present the Bioinformatics Benchmark (BixBench), a dataset comprising over 50 real-world scenarios of practical biological data analysis.<n>We evaluate the performance of two frontier LLMs using a custom agent framework we open source.
arXiv Detail & Related papers (2025-02-28T18:47:57Z)
Deep Learning in Single-Cell and Spatial Transcriptomics Data Analysis: Advances and Challenges from a Data Science Perspective [19.655130697247518]
The development of single-cell and spatial transcriptomics has revolutionized our capacity to investigate cellular properties, functions, and interactions.<n>However, the analysis of single-cell and spatial omics data remains challenging.<n>Deep learning has emerged as a powerful tool capable of handling high-dimensional complex data and automatically identifying meaningful patterns.
arXiv Detail & Related papers (2024-12-04T14:07:11Z)
Discovering physical laws with parallel symbolic enumeration [67.36739393470869]
We introduce parallel symbolic enumeration (PSE) to efficiently distill generic mathematical expressions from limited data.<n>Experiments show that PSE achieves higher accuracy and faster computation compared to the state-of-the-art baseline algorithms.<n> PSE represents an advance in accurate and efficient data-driven discovery of symbolic, interpretable models.
arXiv Detail & Related papers (2024-07-05T10:41:15Z)
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.<n>BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.<n>It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z)
Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation. GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z)
Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation. The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.