A Multi-Agent Framework for Code-Guided, Modular, and Verifiable Automated Machine Learning
- URL: http://arxiv.org/abs/2602.13937v1
- Date: Sun, 15 Feb 2026 00:20:58 GMT
- Title: A Multi-Agent Framework for Code-Guided, Modular, and Verifiable Automated Machine Learning
- Authors: Dat Le, Duc-Cuong Le, Anh-Son Nguyen, Tuan-Dung Bui, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo,
- Abstract summary: iML is a novel multi-agent framework designed to shift AutoML from black-box prompting to a code-guided, modular, and verifiable architectural paradigm.<n>We evaluate iML across MLE-BENCH and the newly introduced iML-BENCH, comprising a diverse range of real-world Kaggle competitions.
- Score: 3.6317933453723232
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Automated Machine Learning (AutoML) has revolutionized the development of data-driven solutions; however, traditional frameworks often function as "black boxes", lacking the flexibility and transparency required for complex, real-world engineering tasks. Recent Large Language Model (LLM)-based agents have shifted toward code-driven approaches. However, they frequently suffer from hallucinated logic and logic entanglement, where monolithic code generation leads to unrecoverable runtime failures. In this paper, we present iML, a novel multi-agent framework designed to shift AutoML from black-box prompting to a code-guided, modular, and verifiable architectural paradigm. iML introduces three main ideas: (1) Code-Guided Planning, which synthesizes a strategic blueprint grounded in autonomous empirical profiling to eliminate hallucination; (2) Code-Modular Implementation, which decouples preprocessing and modeling into specialized components governed by strict interface contracts; and (3) Code-Verifiable Integration, which enforces physical feasibility through dynamic contract verification and iterative self-correction. We evaluate iML across MLE-BENCH and the newly introduced iML-BENCH, comprising a diverse range of real-world Kaggle competitions. The experimental results show iML's superiority over state-of-the-art agents, achieving a valid submission rate of 85% and a competitive medal rate of 45% on MLE-BENCH, with an average standardized performance score (APS) of 0.77. On iML-BENCH, iML significantly outperforms the other approaches by 38%-163% in APS. Furthermore, iML maintains a robust 70% success rate even under stripped task descriptions, effectively filling information gaps through empirical profiling. These results highlight iML's potential to bridge the gap between stochastic generation and reliable engineering, marking a meaningful step toward truly AutoML.
Related papers
- A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge [1.932555230783329]
Lightweight, open-source Python framework designed to democratize the construction of LLM-driven autonomous agents.<n>AgentForge introduces three key innovations: (1) a composable skill abstraction that enables fine-grained task decomposition with formally defined input-output contracts, (2) a unified backend interface supporting seamless switching between cloud-based APIs and local inference engines, and (3) a declarative YAML-based configuration system that separates agent logic from implementation details.
arXiv Detail & Related papers (2026-01-19T20:33:26Z) - MeltRTL: Multi-Expert LLMs with Inference-time Intervention for RTL Code Generation [0.0]
MeltRTL is a novel framework that integrates multi-expert attention with inference-time intervention.<n>MeltRTL significantly improves the accuracy of large language models (LLMs) without retraining the base model.<n>We evaluate MeltRTL on the VerilogEval benchmark, achieving 96% synthesizability and 60% functional correctness.
arXiv Detail & Related papers (2026-01-19T12:49:39Z) - An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems [66.60904891478687]
We propose an Agentic Framework with LLMs (AFL) for solving complex vehicle routing problems.<n>AFL directly extracts knowledge from raw inputs and enables self-contained code generation.<n>We show that AFL substantially outperforms existing LLM-based baselines in both code reliability and solution feasibility.
arXiv Detail & Related papers (2025-10-19T03:59:25Z) - Towards Adaptive ML Benchmarks: Web-Agent-Driven Construction, Domain Expansion, and Metric Optimization [8.356074728041202]
TAM Bench is a benchmark for evaluating large language models (LLMs) on end-to-end machine learning tasks.<n>Three key innovations include a browser automation and LLM-based task acquisition system.<n>Based on 150 curated AutoML tasks, we construct three benchmark subsets of different sizes.
arXiv Detail & Related papers (2025-09-11T10:10:48Z) - SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality [52.948791050405525]
We propose SimMLM, a simple yet powerful framework for multimodal learning with missing modalities.<n>SimMLM consists of a generic Dynamic Mixture of Modality Experts (DMoME) architecture, featuring a dynamic, learnable gating mechanism.<n>Key innovation of SimMLM is the proposed More vs. Fewer (MoFe) ranking loss, which ensures that task accuracy improves or remains stable as more modalities are made available.
arXiv Detail & Related papers (2025-07-25T13:39:34Z) - CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs [16.234259194402163]
We introduce CodeAgents, a prompting framework that codifies multi-agent reasoning and enables structured, token-efficient planning in multi-agent systems.<n>Results show consistent improvements in planning performance, with absolute gains of 3-36 percentage points over natural language prompting baselines.
arXiv Detail & Related papers (2025-07-04T02:20:19Z) - MLZero: A Multi-Agent System for End-to-end Machine Learning Automation [48.716299953336346]
We introduce MLZero, a novel multi-agent framework powered by Large Language Models (LLMs)<n>A cognitive perception module is first employed, transforming raw multimodal inputs into perceptual context.<n> MLZero demonstrates superior performance on MLE-Bench Lite, outperforming all competitors in both success rate and solution quality.
arXiv Detail & Related papers (2025-05-20T05:20:53Z) - MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z) - Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z) - LLaVA-KD: A Framework of Distilling Multimodal Large Language Models [72.68665884790002]
We propose a novel framework to transfer knowledge from l-MLLMs to s-MLLMs.<n>We introduce Multimodal Distillation (MDist) to transfer teacher model's robust representations across both visual and linguistic modalities.<n>We also propose a three-stage training scheme to fully exploit the potential of the proposed distillation strategy.
arXiv Detail & Related papers (2024-10-21T17:41:28Z) - AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.<n>Recent works have started exploiting large language models (LLM) to lessen such burden.<n>This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.