Related papers: AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

URL: http://arxiv.org/abs/2410.20424v3
Date: Tue, 05 Nov 2024 19:46:38 GMT
Title: AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Authors: Ziming Li, Qianbo Zang, David Ma, Jiawei Guo, Tuney Zheng, Minghao Liu, Xinyao Niu, Yue Wang, Jian Yang, Jiaheng Liu, Wanjun Zhong, Wangchunshu Zhou, Wenhao Huang, Ge Zhang,
Abstract summary: AutoKaggle implements an iterative development process that combines code execution and unit testing to ensure code correctness and logic consistency. Our universal data science toolkit, comprising validated functions for data cleaning, feature engineering, and modeling, forms the foundation of this solution. AutoKaggle achieves a validation rate of 0.85 and a comprehensive score of 0.82 in typical data science pipelines.
Score: 45.0447118979891
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches. We propose AutoKaggle, a powerful and user-centric framework that assists data scientists in completing daily data pipelines through a collaborative multi-agent system. AutoKaggle implements an iterative development process that combines code execution, debugging, and comprehensive unit testing to ensure code correctness and logic consistency. The framework offers highly customizable workflows, allowing users to intervene at each phase, thus integrating automated intelligence with human expertise. Our universal data science toolkit, comprising validated functions for data cleaning, feature engineering, and modeling, forms the foundation of this solution, enhancing productivity by streamlining common tasks. We selected 8 Kaggle competitions to simulate data processing workflows in real-world application scenarios. Evaluation results demonstrate that AutoKaggle achieves a validation submission rate of 0.85 and a comprehensive score of 0.82 in typical data science pipelines, fully proving its effectiveness and practicality in handling complex data science tasks.

Related papers

Towards an Introspective Dynamic Model of Globally Distributed Computing Infrastructures [27.473508984130728]
Large-scale scientific collaborations generate petabytes of data, with volumes soon expected to reach exabytes.<n>To manage these computational and storage demands, centralized workflow and data management systems are implemented.<n>A significant obstacle in adopting more effective or AI-driven solutions is the absence of a quick and reliable introspective dynamic model.
arXiv Detail & Related papers (2025-06-24T12:42:36Z)
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science [39.16008227556205]
Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems.<n>Existing frameworks depend on rigid, pre-defined and inflexible coding strategies.<n>We introduce AutoMind, an adaptive, knowledgeable LLM-agent framework.
arXiv Detail & Related papers (2025-06-12T17:59:32Z)
LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback [121.78866929908871]
Large Action Models (LAMs) for AI Agents offer incredible potential but face challenges due to the need for high-quality training data.<n>We present LAM SIMULATOR, a comprehensive framework designed for online exploration of agentic tasks with high-quality feedback.<n>Our framework features a dynamic task query generator, an extensive collection of tools, and an interactive environment where Large Language Model (LLM) Agents can call tools and receive real-time feedback.
arXiv Detail & Related papers (2025-06-02T22:36:02Z)
DatawiseAgent: A Notebook-Centric LLM Agent Framework for Automated Data Science [4.1431677219677185]
DatawiseAgent is a notebook-centric agent framework that unifies interactions among user, agent and the computational environment. It orchestrates four stages, including DSF-like planning, incremental execution, self-ging, and post-filtering. It consistently outperforms or matches state-of-the-art methods across multiple model settings.
arXiv Detail & Related papers (2025-03-10T08:32:33Z)
AutoMR: A Universal Time Series Motion Recognition Pipeline [11.170663268933676]
We present an end-to-end automated motion recognition (AutoMR) pipeline designed for multimodal datasets. The proposed framework seamlessly integrates data preprocessing, model training, hyperparameter tuning, and evaluation, enabling robust performance across diverse scenarios.
arXiv Detail & Related papers (2025-02-21T05:59:41Z)
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models [64.28420991770382]
Data-Juicer 2.0 is a data processing system backed by data processing operators spanning text, image, video, and audio modalities.<n>It supports more critical tasks including data analysis, annotation, and foundation model post-training.<n>It has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z)
DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production [5.030384831047144]
We present DialogAgent, an automated tool for generating synthetic training data that closely mimics real developer interactions. The tool significantly reduces the reliance on manual data generation, increasing efficiency by 4.8 times compared to traditional methods.
arXiv Detail & Related papers (2024-12-11T03:31:36Z)
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level [73.14232472724758]
We introduce Agent K v1.0, an end-to-end autonomous data science agent. It manages the entire data science life cycle by learning from experience. It optimises long- and short-term memory by selectively storing and retrieving key information.
arXiv Detail & Related papers (2024-11-05T23:55:23Z)
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering. Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
Data Interpreter: An LLM Agent For Data Science [43.13678782387546]
Large Language Model (LLM)-based agents have shown effectiveness across many applications. However, their use in data science scenarios requiring solving long-term interconnected tasks, dynamic data adjustments and domain expertise remains challenging. We present Data Interpreter, an LLM-based agent designed to automatically solve various data science problems end-to-end.
arXiv Detail & Related papers (2024-02-28T19:49:55Z)
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA. It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z)
Uncertainty in Automated Ontology Matching: Lessons Learned from an Empirical Experimentation [6.491645162078057]
Ontologies play a critical role in link and semantically integrate datasets via interoperability. This paper approaches data integration from an application perspective, looking techniques based on ontology matching.
arXiv Detail & Related papers (2023-10-18T05:42:51Z)
Towards Lightweight Data Integration using Multi-workflow Provenance and Data Observability [0.2517763905487249]
Integrated data analysis plays a crucial role in scientific discovery, especially in the current AI era. We propose MIDA: an approach for lightweight runtime Multi-workflow Integrated Data Analysis. We show near-zero overhead running up to 100,000 tasks on 1,680 CPU cores on the Summit supercomputer.
arXiv Detail & Related papers (2023-08-17T14:20:29Z)
ChatGPT as your Personal Data Scientist [0.9689893038619583]
This paper introduces a ChatGPT-based conversational data-science framework to act as a "personal data scientist" Our model pivots around four dialogue states: Data visualization, Task Formulation, Prediction Engineering, and Result Summary and Recommendation. In summary, we developed an end-to-end system that not only proves the viability of the novel concept of conversational data science but also underscores the potency of LLMs in solving complex tasks.
arXiv Detail & Related papers (2023-05-23T04:00:16Z)
Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.