AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
- URL: http://arxiv.org/abs/2410.20424v3
- Date: Tue, 05 Nov 2024 19:46:38 GMT
- Title: AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
- Authors: Ziming Li, Qianbo Zang, David Ma, Jiawei Guo, Tuney Zheng, Minghao Liu, Xinyao Niu, Yue Wang, Jian Yang, Jiaheng Liu, Wanjun Zhong, Wangchunshu Zhou, Wenhao Huang, Ge Zhang,
- Abstract summary: AutoKaggle implements an iterative development process that combines code execution and unit testing to ensure code correctness and logic consistency.
Our universal data science toolkit, comprising validated functions for data cleaning, feature engineering, and modeling, forms the foundation of this solution.
AutoKaggle achieves a validation rate of 0.85 and a comprehensive score of 0.82 in typical data science pipelines.
- Score: 45.0447118979891
- License:
- Abstract: Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches. We propose AutoKaggle, a powerful and user-centric framework that assists data scientists in completing daily data pipelines through a collaborative multi-agent system. AutoKaggle implements an iterative development process that combines code execution, debugging, and comprehensive unit testing to ensure code correctness and logic consistency. The framework offers highly customizable workflows, allowing users to intervene at each phase, thus integrating automated intelligence with human expertise. Our universal data science toolkit, comprising validated functions for data cleaning, feature engineering, and modeling, forms the foundation of this solution, enhancing productivity by streamlining common tasks. We selected 8 Kaggle competitions to simulate data processing workflows in real-world application scenarios. Evaluation results demonstrate that AutoKaggle achieves a validation submission rate of 0.85 and a comprehensive score of 0.82 in typical data science pipelines, fully proving its effectiveness and practicality in handling complex data science tasks.
Related papers
- Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level [73.14232472724758]
We introduce Agent K v1.0, an end-to-end autonomous data science agent.
It manages the entire data science life cycle by learning from experience.
It optimises long- and short-term memory by selectively storing and retrieving key information.
arXiv Detail & Related papers (2024-11-05T23:55:23Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Data Interpreter: An LLM Agent For Data Science [43.13678782387546]
Large Language Model (LLM)-based agents have shown effectiveness across many applications.
However, their use in data science scenarios requiring solving long-term interconnected tasks, dynamic data adjustments and domain expertise remains challenging.
We present Data Interpreter, an LLM-based agent designed to automatically solve various data science problems end-to-end.
arXiv Detail & Related papers (2024-02-28T19:49:55Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - Uncertainty in Automated Ontology Matching: Lessons Learned from an
Empirical Experimentation [6.491645162078057]
Ontologies play a critical role in link and semantically integrate datasets via interoperability.
This paper approaches data integration from an application perspective, looking techniques based on ontology matching.
arXiv Detail & Related papers (2023-10-18T05:42:51Z) - Towards Lightweight Data Integration using Multi-workflow Provenance and
Data Observability [0.2517763905487249]
Integrated data analysis plays a crucial role in scientific discovery, especially in the current AI era.
We propose MIDA: an approach for lightweight runtime Multi-workflow Integrated Data Analysis.
We show near-zero overhead running up to 100,000 tasks on 1,680 CPU cores on the Summit supercomputer.
arXiv Detail & Related papers (2023-08-17T14:20:29Z) - ChatGPT as your Personal Data Scientist [0.9689893038619583]
This paper introduces a ChatGPT-based conversational data-science framework to act as a "personal data scientist"
Our model pivots around four dialogue states: Data visualization, Task Formulation, Prediction Engineering, and Result Summary and Recommendation.
In summary, we developed an end-to-end system that not only proves the viability of the novel concept of conversational data science but also underscores the potency of LLMs in solving complex tasks.
arXiv Detail & Related papers (2023-05-23T04:00:16Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.