An AI system to help scientists write expert-level empirical software
- URL: http://arxiv.org/abs/2509.06503v1
- Date: Mon, 08 Sep 2025 10:08:36 GMT
- Title: An AI system to help scientists write expert-level empirical software
- Authors: Eser Aygün, Anastasiya Belyaeva, Gheorghe Comanici, Marc Coram, Hao Cui, Jake Garrison, Renee Johnston Anton Kast, Cory Y. McLean, Peter Norgaard, Zahra Shamsi, David Smalling, James Thompson, Subhashini Venugopalan, Brian P. Williams, Chujun He, Sarah Martinson, Martyna Plomecka, Lai Wei, Yuchen Zhou, Qian-Ze Zhu, Matthew Abraham, Erica Brand, Anna Bulanova, Jeffrey A. Cardille, Chris Co, Scott Ellsworth, Grace Joseph, Malcolm Kane, Ryan Krueger, Johan Kartiwa, Dan Liebling, Jan-Matthis Lueckmann, Paul Raccuglia, Xuefei, Wang, Katherine Chou, James Manyika, Yossi Matias, John C. Platt, Lizzie Dorfman, Shibl Mourad, Michael P. Brenner,
- Abstract summary: We present an AI system that creates expert-level scientific software to maximize a quality metric.<n>The system achieves expert-level results when it explores and integrates complex research ideas from external sources.<n>In bioinformatics, it discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard.<n>In epidemiology, it generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations.
- Score: 25.01900335784437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software to support computational experiments. To address this, we present an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. The system uses a Large Language Model (LLM) and Tree Search (TS) to systematically improve the quality metric and intelligently navigate the large space of possible solutions. The system achieves expert-level results when it explores and integrates complex research ideas from external sources. The effectiveness of tree search is demonstrated across a wide range of benchmarks. In bioinformatics, it discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard. In epidemiology, it generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations. Our method also produced state-of-the-art software for geospatial analysis, neural activity prediction in zebrafish, time series forecasting and numerical solution of integrals. By devising and implementing novel solutions to diverse tasks, the system represents a significant step towards accelerating scientific progress.
Related papers
- InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery [138.0404718571971]
We introduce InternAgent-1.5, a unified system designed for end-to-end scientific discovery.<n>The system is built on a structured architecture composed of three coordinated subsystems for generation, verification, and evolution.<n>We evaluate InternAgent-1.5 on scientific reasoning benchmarks such as GAIA, HLE, GPQA, and FrontierScience.
arXiv Detail & Related papers (2026-02-09T18:36:06Z) - ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms [4.235429894371577]
ATHENA is an agentic framework designed as an Autonomous Lab to manage the end-to-end computational research lifecycle.<n>Its core is the HENA loop, a knowledge-driven diagnostic process framed as a Contextual problem.<n>The framework achieves super-human performance, reaching validation errors of $10-14$.
arXiv Detail & Related papers (2025-12-03T06:05:27Z) - Barbarians at the Gate: How AI is Upending Systems Research [58.95406995634148]
We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery.<n>We term this approach as AI-Driven Research for Systems ( ADRS), which iteratively generates, evaluates, and refines solutions.<n>Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.
arXiv Detail & Related papers (2025-10-07T17:49:24Z) - TusoAI: Agentic Optimization for Scientific Methods [16.268579802762247]
Large language models (LLMs) have demonstrated strong capabilities in synthesizing literature, reasoning with empirical data, and generating domain-specific code.<n>Here, we introduce TusoAI, an agentic AI system that takes a scientific task description with an evaluation function.<n>TusoAI integrates domain knowledge into a knowledge tree representation and performs iterative, domain-specific optimization and model diagnosis.
arXiv Detail & Related papers (2025-09-28T17:30:44Z) - A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers [221.34650992288505]
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research.<n>This survey reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate.<n>We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge.
arXiv Detail & Related papers (2025-08-28T18:30:52Z) - ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows [82.07367406991678]
Large Language Models (LLMs) have extended their impact beyond Natural Language Processing.<n>Among these, computer-using agents are capable of interacting with operating systems as humans do.<n>We introduce ScienceBoard, which encompasses a realistic, multi-domain environment featuring dynamic and visually rich scientific software.
arXiv Detail & Related papers (2025-05-26T12:27:27Z) - AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research [58.944125758758936]
The Science of Science (SoS) explores the mechanisms underlying scientific discovery.<n>The advent of artificial intelligence (AI) presents a transformative opportunity for the next generation of SoS.<n>We outline the advantages of AI over traditional methods, discuss potential limitations, and propose pathways to overcome them.
arXiv Detail & Related papers (2025-05-17T15:01:33Z) - Advancing AI Research Assistants with Expert-Involved Learning [84.30323604785646]
Large language models (LLMs) and large multimodal models (LMMs) promise to accelerate biomedical discovery, yet their reliability remains unclear.<n>We introduce ARIEL (AI Research Assistant for Expert-in-the-Loop Learning), an open-source evaluation and optimization framework.<n>We find that state-of-the-art models generate fluent but incomplete summaries, whereas LMMs struggle with detailed visual reasoning.
arXiv Detail & Related papers (2025-05-03T14:21:48Z) - The AI Cosmologist I: An Agentic System for Automated Data Analysis [0.0]
The AI Cosmologist implements a complete pipeline from idea generation to experimental evaluation and research dissemination.<n>Unlike traditional auto machine-learning systems, the AI Cosmologist generates diverse implementation strategies.<n>Results indicate that agentic systems can automate portions of the research process, potentially accelerating scientific discovery.
arXiv Detail & Related papers (2025-04-04T13:12:08Z) - CS-PaperSum: A Large-Scale Dataset of AI-Generated Summaries for Scientific Papers [3.929864777332447]
CS-PaperSum is a large-scale dataset of 91,919 papers from 31 top-tier computer science conferences.<n>Our dataset enables automated literature analysis, research trend forecasting, and AI-driven scientific discovery.
arXiv Detail & Related papers (2025-02-27T22:48:35Z) - A system for objectively measuring behavior and the environment to support large-scale studies on childhood obesity [7.588188945850937]
We present an integrated system that collects and extracts multiple behavioral and environmental indicators.<n>Our goal is to present a detailed account of the design principles, the implementation processes, and the evaluation of integrated algorithms.
arXiv Detail & Related papers (2025-01-05T14:27:09Z) - Toward a Team of AI-made Scientists for Scientific Discovery from Gene Expression Data [21.766339368749872]
We introduce a novel framework, a Team of AI-made Scientists (TAIS), designed to streamline the scientific discovery pipeline.<n>TAIS comprises simulated roles, including a project manager, data engineer, and domain expert, each represented by a Large Language Model (LLM)<n>These roles collaborate to replicate the tasks typically performed by data scientists, with a specific focus on identifying disease-predictive genes.
arXiv Detail & Related papers (2024-02-15T06:30:12Z) - An Extensible Benchmark Suite for Learning to Simulate Physical Systems [60.249111272844374]
We introduce a set of benchmark problems to take a step towards unified benchmarks and evaluation protocols.
We propose four representative physical systems, as well as a collection of both widely used classical time-based and representative data-driven methods.
arXiv Detail & Related papers (2021-08-09T17:39:09Z) - Opportunities and Challenges of Deep Learning Methods for
Electrocardiogram Data: A Systematic Review [62.490310870300746]
The electrocardiogram (ECG) is one of the most commonly used diagnostic tools in medicine and healthcare.
Deep learning methods have achieved promising results on predictive healthcare tasks using ECG signals.
This paper presents a systematic review of deep learning methods for ECG data from both modeling and application perspectives.
arXiv Detail & Related papers (2019-12-28T02:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.