DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation
- URL: http://arxiv.org/abs/2503.07044v2
- Date: Fri, 03 Oct 2025 13:29:21 GMT
- Title: DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation
- Authors: Ziming You, Yumiao Zhang, Dexuan Xu, Yiwei Lou, Yandong Yan, Wei Wang, Huaming Zhang, Yu Huang,
- Abstract summary: We introduce DatawiseAgent, a notebook-centric large language model (LLM) agent framework for adaptive and robust data science automation.<n>Inspired by how human data scientists work in computational notebooks, DatawiseAgent introduces a unified interaction representation and a multi-stage architecture.
- Score: 10.390461679868197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing large language model (LLM) agents for automating data science show promise, but they remain constrained by narrow task scopes, limited generalization across tasks and models, and over-reliance on state-of-the-art (SOTA) LLMs. We introduce DatawiseAgent, a notebook-centric LLM agent framework for adaptive and robust data science automation. Inspired by how human data scientists work in computational notebooks, DatawiseAgent introduces a unified interaction representation and a multi-stage architecture based on finite-state transducers (FSTs). This design enables flexible long-horizon planning, progressive solution development, and robust recovery from execution failures. Extensive experiments across diverse data science scenarios and models show that DatawiseAgent consistently achieves SOTA performance by surpassing strong baselines such as AutoGen and TaskWeaver, demonstrating superior effectiveness and adaptability. Further evaluations reveal graceful performance degradation under weaker or smaller models, underscoring the robustness and scalability.
Related papers
- FELA: A Multi-Agent Evolutionary System for Feature Engineering of Industrial Event Log Data [7.129004248608012]
Event log data represents one of the most valuable assets for modern digital services.<n>Existing automatic feature engineering approaches, such as AutoML or genetic methods, often suffer from limited explainability.<n>We propose FELA, a multi-agent evolutionary system that autonomously extracts meaningful and high-performing features from complex industrial event log data.
arXiv Detail & Related papers (2025-10-29T06:57:32Z) - VDSAgents: A PCS-Guided Multi-Agent System for Veridical Data Science Automation [11.521235834823301]
Large language models (LLMs) become increasingly integrated into data science for automated system design.<n>This paper provides VDSAgents, a multi-agent system grounded in the Predictability-Computability-Stability (PCS) principles.<n>We evaluate VDSAgents on nine datasets with diverse characteristics, comparing it with state-of-the-art end-to-end data science systems.
arXiv Detail & Related papers (2025-10-28T12:07:50Z) - Automating Data-Driven Modeling and Analysis for Engineering Applications using Large Language Model Agents [3.344730946122235]
We propose an innovative pipeline utilizing Large Language Model (LLM) agents to automate data-driven modeling and analysis.<n>We evaluate two LLM-agent frameworks: a multi-agent system featuring specialized collaborative agents, and a single-agent system based on the Reasoning and Acting (ReAct) paradigm.
arXiv Detail & Related papers (2025-10-01T19:28:35Z) - Autonomous Data Agents: A New Opportunity for Smart Data [50.02229219403014]
Report argues that DataAgents represent a paradigm shift toward autonomous data-to-knowledge systems.<n>DataAgents transform complex and unstructured data into coherent and actionable knowledge.<n>We first examine why the convergence of agentic AI and data-to-knowledge systems has emerged as a critical trend.
arXiv Detail & Related papers (2025-09-23T06:46:41Z) - Structured Agentic Workflows for Financial Time-Series Modeling with LLMs and Reflective Feedback [16.04516547661581]
Time-series data is central to decision-making in financial markets, yet building high-performing, interpretable, and auditable models remains a major challenge.<n>textsfTSAgent is a modular agentic framework designed to automate and enhance time-series modeling for financial applications.
arXiv Detail & Related papers (2025-08-19T15:14:49Z) - AgenticData: An Agentic Data Analytics System for Heterogeneous Data [12.67277567222908]
AgenticData is an agentic data analytics system that allows users to pose natural language (NL) questions while autonomously analyzing data sources across multiple domains.<n>We propose a multi-agent collaboration strategy by utilizing a data profiling agent for discovering relevant data, a semantic cross-validation agent for iterative optimization based on feedback, and a smart memory agent for maintaining short-term context.
arXiv Detail & Related papers (2025-08-07T03:33:59Z) - LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback [121.78866929908871]
Large Action Models (LAMs) for AI Agents offer incredible potential but face challenges due to the need for high-quality training data.<n>We present LAM SIMULATOR, a comprehensive framework designed for online exploration of agentic tasks with high-quality feedback.<n>Our framework features a dynamic task query generator, an extensive collection of tools, and an interactive environment where Large Language Model (LLM) Agents can call tools and receive real-time feedback.
arXiv Detail & Related papers (2025-06-02T22:36:02Z) - IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis [60.32962597618861]
IDA-Bench is a novel benchmark evaluating large language models in multi-round interactive scenarios.<n>Agent performance is judged by comparing its final numerical output to the human-derived baseline.<n>Even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on 50% of the tasks, highlighting limitations not evident in single-turn tests.
arXiv Detail & Related papers (2025-05-23T09:37:52Z) - R&D-Agent: An LLM-Agent Framework Towards Autonomous Data Science [70.1638335489284]
High-level machine learning engineering tasks remain labor-intensive and iterative.<n>We introduce R&D-Agent, a comprehensive, decoupled, and framework that formalizes the machine learning process.<n>R&D-Agent defines the MLE into two phases and six components, turning agent design for MLE into a principled, testable process.
arXiv Detail & Related papers (2025-05-20T06:07:00Z) - TAIJI: MCP-based Multi-Modal Data Analytics on Data Lakes [25.05627023905607]
We envision a new multi-modal data analytics system based on the Model Context Protocol (MCP)<n>First, we define a semantic operator hierarchy tailored for querying multi-modal data in data lakes.<n>Next, we introduce an MCP-based execution framework, in which each MCP server hosts specialized foundation models optimized for specific data modalities.
arXiv Detail & Related papers (2025-05-16T14:03:30Z) - DataMosaic: Explainable and Verifiable Multi-Modal Data Analytics through Extract-Reason-Verify [11.10351765834947]
Large Language Models (LLMs) are transforming data analytics, but their widespread adoption is hindered by two critical limitations.
They are not explainable (opaque reasoning processes) and not verifiable (prone to hallucinations and unchecked errors)
We propose DataMosaic, a framework designed to make LLM-powered analytics both explainable and verifiable.
arXiv Detail & Related papers (2025-04-14T09:38:23Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - A Collaborative Multi-Agent Approach to Retrieval-Augmented Generation Across Diverse Data [0.0]
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs)<n>Traditional RAG systems typically use a single-agent architecture to handle query generation, data retrieval, and response synthesis.<n>This paper proposes a multi-agent RAG system to address these limitations.
arXiv Detail & Related papers (2024-12-08T07:18:19Z) - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.
Recent works have started exploiting large language models (LLM) to lessen such burden.
This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - DCA-Bench: A Benchmark for Dataset Curation Agents [9.60250892491588]
We propose a dataset curation agent benchmark, DCA-Bench, to measure large language models' capability of detecting hidden dataset quality issues.
Specifically, we collect diverse real-world dataset quality issues from eight open dataset platforms as a testbed.
The proposed benchmark can also serve as a testbed for measuring the capability of LLMs in problem discovery rather than just problem-solving.
arXiv Detail & Related papers (2024-06-11T14:02:23Z) - Data Interpreter: An LLM Agent For Data Science [43.13678782387546]
Large Language Model (LLM)-based agents have shown effectiveness across many applications.
However, their use in data science scenarios requiring solving long-term interconnected tasks, dynamic data adjustments and domain expertise remains challenging.
We present Data Interpreter, an LLM-based agent designed to automatically solve various data science problems end-to-end.
arXiv Detail & Related papers (2024-02-28T19:49:55Z) - DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning [56.887047551101574]
We present DS-Agent, a novel framework that harnesses large language models (LLMs) agent and case-based reasoning (CBR)
In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle.
In the deployment stage, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm, significantly reducing the demand on foundational capabilities of LLMs.
arXiv Detail & Related papers (2024-02-27T12:26:07Z) - AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning [98.26836657967162]
textbfAgentOhana aggregates agent trajectories from distinct environments, spanning a wide array of scenarios.
textbfxLAM-v0.1, a large action model tailored for AI agents, demonstrates exceptional performance across various benchmarks.
arXiv Detail & Related papers (2024-02-23T18:56:26Z) - MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [86.61052121715689]
MatPlotAgent is a model-agnostic framework designed to automate scientific data visualization tasks.
MatPlotBench is a high-quality benchmark consisting of 100 human-verified test cases.
arXiv Detail & Related papers (2024-02-18T04:28:28Z) - MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation [96.71370747681078]
We introduce MLAgentBench, a suite of 13 tasks ranging from improving model performance on CIFAR-10 to recent research problems like BabyLM.
For each task, an agent can perform actions like reading/writing files, executing code, and inspecting outputs.
We benchmark agents based on Claude v1.0, Claude v2.1, Claude v3 Opus, GPT-4, GPT-4-turbo, Gemini-Pro, and Mixtral and find that a Claude v3 Opus agent is the best in terms of success rate.
arXiv Detail & Related papers (2023-10-05T04:06:12Z) - SEED: Domain-Specific Data Curation With Large Language Models [22.54280367957015]
We present SEED, an LLM-as-compiler approach that automatically generates domain-specific data curation solutions via Large Language Models (LLMs)
SEED features an that automatically selects from the four LLM-assisted modules and forms a hybrid execution pipeline that best fits the task at hand.
arXiv Detail & Related papers (2023-10-01T17:59:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.