Data Interpreter: An LLM Agent For Data Science
- URL: http://arxiv.org/abs/2402.18679v3
- Date: Tue, 12 Mar 2024 17:26:53 GMT
- Title: Data Interpreter: An LLM Agent For Data Science
- Authors: Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Danyang
Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang,
Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Wenyi Wang, Xiangru Tang,
Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zongze Xu,
Chenglin Wu
- Abstract summary: The Data Interpreter is a solution designed to solve with code.
It emphasizes three pivotal techniques to augment problem-solving in data science.
It showed a 26% increase in the MATH dataset and a remarkable 112% improvement in open-ended tasks.
- Score: 43.99482533437711
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable
effectiveness. However, their performance can be compromised in data science
scenarios that require real-time data adjustment, expertise in optimization due
to complex dependencies among various tasks, and the ability to identify
logical errors for precise reasoning. In this study, we introduce the Data
Interpreter, a solution designed to solve with code that emphasizes three
pivotal techniques to augment problem-solving in data science: 1) dynamic
planning with hierarchical graph structures for real-time data adaptability;2)
tool integration dynamically to enhance code proficiency during execution,
enriching the requisite expertise;3) logical inconsistency identification in
feedback, and efficiency enhancement through experience recording. We evaluate
the Data Interpreter on various data science and real-world tasks. Compared to
open-source baselines, it demonstrated superior performance, exhibiting
significant improvements in machine learning tasks, increasing from 0.86 to
0.95. Additionally, it showed a 26% increase in the MATH dataset and a
remarkable 112% improvement in open-ended tasks. The solution will be released
at https://github.com/geekan/MetaGPT.
Related papers
- InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities [27.09178257629886]
InfiAlign is a scalable and sample-efficient post-training framework for large language models (LLMs)<n>At the core of InfiAlign is a robust data selection pipeline that automatically curates high-quality alignment data from open-source reasoning.<n>Our results highlight the effectiveness of combining principled data selection with full-stage post-training.
arXiv Detail & Related papers (2025-08-07T15:34:06Z) - DatawiseAgent: A Notebook-Centric LLM Agent Framework for Automated Data Science [4.1431677219677185]
DatawiseAgent is a notebook-centric agent framework that unifies interactions among user, agent and the computational environment.
It orchestrates four stages, including DSF-like planning, incremental execution, self-ging, and post-filtering.
It consistently outperforms or matches state-of-the-art methods across multiple model settings.
arXiv Detail & Related papers (2025-03-10T08:32:33Z) - Exploring LLM Agents for Cleaning Tabular Machine Learning Datasets [19.844836459291546]
High-quality, error-free datasets are a key ingredient in building reliable, accurate, and unbiased machine learning (ML) models.
However, real world datasets often suffer from errors due to sensor malfunctions, data entry mistakes, or improper data integration across multiple sources.
In this study, we investigate whether Large Language Models (LLMs) can help alleviate the burden of manual data cleaning.
arXiv Detail & Related papers (2025-03-09T15:29:46Z) - Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models [64.28420991770382]
Data-Juicer 2.0 is a data processing system backed by data processing operators spanning text, image, video, and audio modalities.<n>It supports more critical tasks including data analysis, annotation, and foundation model post-training.<n>It has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z) - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - LLM4DS: Evaluating Large Language Models for Data Science Code Generation [0.0]
This paper empirically assesses the performance of four leading AI assistants-Microsoft Copilot (GPT-4 Turbo), ChatGPT (o1-preview), Claude (3.5 Sonnet) and Perplexity Labs (Llama-3.1-70b-instruct)
All models exceeded a 50% success rate, confirming their capability beyond random chance.
ChatGPT demonstrated consistent performance across varying difficulty levels, while Claude's success rate fluctuated with task complexity.
arXiv Detail & Related papers (2024-11-16T18:43:26Z) - DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks.
This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions.
Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - HaT5: Hate Language Identification using Text-to-Text Transfer
Transformer [1.2532400738980594]
We investigate the performance of a state-of-the art (SoTA) architecture T5 across 5 different tasks from 2 relatively diverse datasets.
To improve performance, we augment the training data by using an autoregressive model.
It reveals the difficulties of poor data annotation by using a small set of examples.
arXiv Detail & Related papers (2022-02-11T15:21:27Z) - Exploring the Efficacy of Automatically Generated Counterfactuals for
Sentiment Analysis [17.811597734603144]
We propose an approach to automatically generating counterfactual data for data augmentation and explanation.
A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance.
arXiv Detail & Related papers (2021-06-29T10:27:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.