TusoAI: Agentic Optimization for Scientific Methods
- URL: http://arxiv.org/abs/2509.23986v1
- Date: Sun, 28 Sep 2025 17:30:44 GMT
- Title: TusoAI: Agentic Optimization for Scientific Methods
- Authors: Alistair Turcan, Kexin Huang, Lei Li, Martin Jinye Zhang,
- Abstract summary: Large language models (LLMs) have demonstrated strong capabilities in synthesizing literature, reasoning with empirical data, and generating domain-specific code.<n>Here, we introduce TusoAI, an agentic AI system that takes a scientific task description with an evaluation function.<n>TusoAI integrates domain knowledge into a knowledge tree representation and performs iterative, domain-specific optimization and model diagnosis.
- Score: 16.268579802762247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific discovery is often slowed by the manual development of computational tools needed to analyze complex experimental data. Building such tools is costly and time-consuming because scientists must iteratively review literature, test modeling and scientific assumptions against empirical data, and implement these insights into efficient software. Large language models (LLMs) have demonstrated strong capabilities in synthesizing literature, reasoning with empirical data, and generating domain-specific code, offering new opportunities to accelerate computational method development. Existing LLM-based systems either focus on performing scientific analyses using existing computational methods or on developing computational methods or models for general machine learning without effectively integrating the often unstructured knowledge specific to scientific domains. Here, we introduce TusoAI , an agentic AI system that takes a scientific task description with an evaluation function and autonomously develops and optimizes computational methods for the application. TusoAI integrates domain knowledge into a knowledge tree representation and performs iterative, domain-specific optimization and model diagnosis, improving performance over a pool of candidate solutions. We conducted comprehensive benchmark evaluations demonstrating that TusoAI outperforms state-of-the-art expert methods, MLE agents, and scientific AI agents across diverse tasks, such as single-cell RNA-seq data denoising and satellite-based earth monitoring. Applying TusoAI to two key open problems in genetics improved existing computational methods and uncovered novel biology, including 9 new associations between autoimmune diseases and T cell subtypes and 7 previously unreported links between disease variants linked to their target genes. Our code is publicly available at https://github.com/Alistair-Turcan/TusoAI.
Related papers
- Accelerating Scientific Research with Gemini: Case Studies and Common Techniques [105.15622072347811]
Large language models (LLMs) have opened new avenues for accelerating scientific research.<n>We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models.
arXiv Detail & Related papers (2026-02-03T18:56:17Z) - An Agentic Framework for Autonomous Materials Computation [70.24472585135929]
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery.<n>Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific experiments.<n>Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations.
arXiv Detail & Related papers (2025-12-22T15:03:57Z) - SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z) - AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning [0.0]
AgenticSciML is a collaborative multi-agent system in which over 10 specialized AI agents collaborate to propose, critique, and refine SciML solutions.<n>The framework integrates structured debate, retrieval-augmented method memory, and ensemble-guided evolutionary search.<n>Results show that collaborative reasoning among AI agents can yield emergent methodological innovation.
arXiv Detail & Related papers (2025-11-10T16:06:33Z) - SR-Scientist: Scientific Equation Discovery With Agentic AI [27.014966811260212]
We present SR-Scientist, a framework that implements the Large Language Models (LLMs) from a simple equation proposer to an autonomous AI scientist.<n>Specifically, we wrap the code interpreter into a set of tools for data analysis and equation evaluation.<n> Empirical results show that SR-Scientist outperforms baseline methods by an absolute margin of 6% to 35% on datasets.
arXiv Detail & Related papers (2025-10-13T17:35:23Z) - Spec-Driven AI for Science: The ARIA Framework for Automated and Reproducible Data Analysis [23.28226188948918]
ARIA is a spec-driven, human-in-the-loop framework for automated and interpretable data analysis.<n>ARIA integrates six layers, namely Command, Context, Code, Data, Orchestration, and AI Module.<n>ARIA establishes a new paradigm for transparent, collaborative, and reproducible scientific discovery.
arXiv Detail & Related papers (2025-10-13T08:32:43Z) - An AI system to help scientists write expert-level empirical software [25.01900335784437]
We present an AI system that creates expert-level scientific software to maximize a quality metric.<n>The system achieves expert-level results when it explores and integrates complex research ideas from external sources.<n>In bioinformatics, it discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard.<n>In epidemiology, it generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations.
arXiv Detail & Related papers (2025-09-08T10:08:36Z) - A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers [221.34650992288505]
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research.<n>This survey reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate.<n>We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge.
arXiv Detail & Related papers (2025-08-28T18:30:52Z) - Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop [0.0]
SciLink is an open-source, multi-agent artificial intelligence framework designed to operationalize serendipity in materials research.<n>It creates a direct, automated link between experimental observation, novelty assessment, and theoretical simulations.<n>We show its application to atomic-resolution and hyperspectral data, its capacity to integrate real-time human expert guidance, and its ability to close the research loop.
arXiv Detail & Related papers (2025-08-07T04:59:17Z) - PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models [59.17570021208177]
PyTDC is a machine-learning platform providing streamlined training, evaluation, and inference software for multimodal biological AI models.<n>This paper discusses the components of PyTDC's architecture and, to our knowledge, the first-of-its-kind case study on the introduced single-cell drug-target nomination ML task.
arXiv Detail & Related papers (2025-05-08T18:15:38Z) - MLGym: A New Framework and Benchmark for Advancing AI Research Agents [51.9387884953294]
We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing large language models on AI research tasks.<n>This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents.<n>We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro.
arXiv Detail & Related papers (2025-02-20T12:28:23Z) - STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond [68.47402386668846]
We introduce Structured Reasoning In Critical Text Assessment (STRICTA) to model text assessment as an explicit, step-wise reasoning process.<n>STRICTA breaks down the assessment into a graph of interconnected reasoning steps drawing on causality theory.<n>We apply STRICTA to a dataset of over 4000 reasoning steps from roughly 40 biomedical experts on more than 20 papers.
arXiv Detail & Related papers (2024-09-09T06:55:37Z) - EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications [0.2826977330147589]
We propose a web-based end-to-end pipeline that is capable of preprocessing, training, evaluating, and visualizing machine learning models.
Our library assists in recognizing, classifying, clustering, and predicting a wide range of multi-modal, multi-sensor datasets.
arXiv Detail & Related papers (2024-03-27T02:24:38Z) - PETScML: Second-order solvers for training regression problems in Scientific Machine Learning [0.22499166814992438]
In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis.
We introduce a software built on top of the Portable and Extensible Toolkit for Scientific computation to bridge the gap between deep-learning software and conventional machine-learning techniques.
arXiv Detail & Related papers (2024-03-18T18:59:42Z) - ChemMiner: A Large Language Model Agent System for Chemical Literature Data Mining [56.15126714863963]
ChemMiner is an end-to-end framework for extracting chemical data from literature.<n>ChemMiner incorporates three specialized agents: a text analysis agent for coreference mapping, a multimodal agent for non-textual information extraction, and a synthesis analysis agent for data generation.<n> Experimental results demonstrate reaction identification rates comparable to human chemists while significantly reducing processing time, with high accuracy, recall, and F1 scores.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Simulation Intelligence: Towards a New Generation of Scientific Methods [81.75565391122751]
"Nine Motifs of Simulation Intelligence" is a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence.
We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system.
We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery.
arXiv Detail & Related papers (2021-12-06T18:45:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.