Spec-Driven AI for Science: The ARIA Framework for Automated and Reproducible Data Analysis
- URL: http://arxiv.org/abs/2510.11143v1
- Date: Mon, 13 Oct 2025 08:32:43 GMT
- Title: Spec-Driven AI for Science: The ARIA Framework for Automated and Reproducible Data Analysis
- Authors: Chuke Chen, Biao Luo, Nan Li, Boxiang Wang, Hang Yang, Jing Guo, Ming Xu,
- Abstract summary: ARIA is a spec-driven, human-in-the-loop framework for automated and interpretable data analysis.<n>ARIA integrates six layers, namely Command, Context, Code, Data, Orchestration, and AI Module.<n>ARIA establishes a new paradigm for transparent, collaborative, and reproducible scientific discovery.
- Score: 23.28226188948918
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The rapid expansion of scientific data has widened the gap between analytical capability and research intent. Existing AI-based analysis tools, ranging from AutoML frameworks to agentic research assistants, either favor automation over transparency or depend on manual scripting that hinders scalability and reproducibility. We present ARIA (Automated Research Intelligence Assistant), a spec-driven, human-in-the-loop framework for automated and interpretable data analysis. ARIA integrates six interoperable layers, namely Command, Context, Code, Data, Orchestration, and AI Module, within a document-centric workflow that unifies human reasoning and machine execution. Through natural-language specifications, researchers define analytical goals while ARIA autonomously generates executable code, validates computations, and produces transparent documentation. Beyond achieving high predictive accuracy, ARIA can rapidly identify optimal feature sets and select suitable models, minimizing redundant tuning and repetitive experimentation. In the Boston Housing case, ARIA discovered 25 key features and determined XGBoost as the best performing model (R square = 0.93) with minimal overfitting. Evaluations across heterogeneous domains demonstrate ARIA's strong performance, interpretability, and efficiency compared with state-of-the-art systems. By combining AI for research and AI for science principles within a spec-driven architecture, ARIA establishes a new paradigm for transparent, collaborative, and reproducible scientific discovery.
Related papers
- QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities [0.7519872646378835]
QUASAR is a universal autonomous system for atomistic simulation designed to facilitate production-grade scientific discovery.<n>We benchmark QUASAR against a series of three-tiered tasks, progressing from routine tasks to frontier research challenges such as photocatalyst screening and novel material assessment.<n>Results suggest that QUASAR can function as a general atomistic reasoning system rather than a task-specific automation framework.
arXiv Detail & Related papers (2026-01-30T05:29:44Z) - Towards Agentic Intelligence for Materials Science [73.4576385477731]
This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining to goal-conditioned agents interfacing with simulation and experimental platforms.<n>To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science.
arXiv Detail & Related papers (2026-01-29T23:48:43Z) - An Agentic Framework for Autonomous Materials Computation [70.24472585135929]
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery.<n>Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific experiments.<n>Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations.
arXiv Detail & Related papers (2025-12-22T15:03:57Z) - Seismology modeling agent: A smart assistant for geophysical researchers [14.28965530601497]
This paper proposes an intelligent, interactive workflow powered by Large Language Models (LLMs)<n>We introduce the first Model Context Protocol (MCP) server suite for SPECFEM.<n>The framework supports both fully automated execution and human-in-the-loop collaboration.
arXiv Detail & Related papers (2025-12-16T14:18:26Z) - Barbarians at the Gate: How AI is Upending Systems Research [58.95406995634148]
We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery.<n>We term this approach as AI-Driven Research for Systems ( ADRS), which iteratively generates, evaluates, and refines solutions.<n>Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.
arXiv Detail & Related papers (2025-10-07T17:49:24Z) - AutoLabs: Cognitive Multi-Agent Systems with Self-Correction for Autonomous Chemical Experimentation [0.10999592665107412]
AutoLabs is a self-correcting, multi-agent architecture designed to autonomously translate natural-language instructions into executable protocols.<n>We present a comprehensive evaluation framework featuring five benchmark experiments of increasing complexity.<n>Our results demonstrate that agent reasoning capacity is the most critical factor for success.
arXiv Detail & Related papers (2025-09-30T01:51:46Z) - TusoAI: Agentic Optimization for Scientific Methods [16.268579802762247]
Large language models (LLMs) have demonstrated strong capabilities in synthesizing literature, reasoning with empirical data, and generating domain-specific code.<n>Here, we introduce TusoAI, an agentic AI system that takes a scientific task description with an evaluation function.<n>TusoAI integrates domain knowledge into a knowledge tree representation and performs iterative, domain-specific optimization and model diagnosis.
arXiv Detail & Related papers (2025-09-28T17:30:44Z) - AutoMind: Adaptive Knowledgeable Agent for Automated Data Science [70.33796196103499]
Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems.<n>Existing frameworks depend on rigid, pre-defined and inflexible coding strategies.<n>We introduce AutoMind, an adaptive, knowledgeable LLM-agent framework.
arXiv Detail & Related papers (2025-06-12T17:59:32Z) - AIRepr: An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science [5.064778712920176]
Large language models (LLMs) are increasingly used to automate data analysis through executable code generation.<n>We present $itAIRepr, an $itA$nalyst - $itI$nspector framework for automatically evaluating and improving the $itRepr$oducibility of LLM-generated data analysis.
arXiv Detail & Related papers (2025-02-23T01:15:50Z) - Autonomous Microscopy Experiments through Large Language Model Agents [4.241267255764773]
Large language models (LLMs) are revolutionizing self driving laboratories (SDLs) for materials research.<n>We introduce Artificially Intelligent Lab Assistant (AILA), a framework automating atomic force microscopy through LLM driven agents.<n>We find that state of the art models struggle with basic tasks and coordination scenarios.
arXiv Detail & Related papers (2024-12-18T09:35:28Z) - Enhancing Feature Selection and Interpretability in AI Regression Tasks Through Feature Attribution [38.53065398127086]
This study investigates the potential of feature attribution methods to filter out uninformative features in input data for regression problems.
We introduce a feature selection pipeline that combines Integrated Gradients with k-means clustering to select an optimal set of variables from the initial data space.
To validate the effectiveness of this approach, we apply it to a real-world industrial problem - blade vibration analysis in the development process of turbo machinery.
arXiv Detail & Related papers (2024-09-25T09:50:51Z) - TagLab: A human-centric AI system for interactive semantic segmentation [63.84619323110687]
TagLab is an open-source AI-assisted software for annotating large orthoimages.
It speeds up image annotation from scratch through assisted tools, creates custom fully automatic semantic segmentation models, and allows the quick edits of automatic predictions.
We report our results in two different scenarios, marine ecology, and architectural heritage.
arXiv Detail & Related papers (2021-12-23T16:50:06Z) - Integrated Benchmarking and Design for Reproducible and Accessible
Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking.
One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible.
We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.