Related papers: ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning

ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning

URL: http://arxiv.org/abs/2506.16499v1
Date: Thu, 19 Jun 2025 17:53:28 GMT
Title: ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning
Authors: Zexi Liu, Yuzhu Cai, Xinyu Zhu, Yujie Zheng, Runkun Chen, Ying Wen, Yanfeng Wang, Weinan E, Siheng Chen,
Abstract summary: We propose ML-Master, a novel AI4AI agent that seamlessly integrates exploration and reasoning by employing a selectively scoped memory mechanism.<n>We evaluate ML-Master on the MLE-Bench, where it achieves a 29.3% average medal rate, significantly surpassing existing methods.
Score: 49.25518866694287
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As AI capabilities advance toward and potentially beyond human-level performance, a natural transition emerges where AI-driven development becomes more efficient than human-centric approaches. A promising pathway toward this transition lies in AI-for-AI (AI4AI), which leverages AI techniques to automate and optimize the design, training, and deployment of AI systems themselves. While LLM-based agents have shown the potential to realize AI4AI, they are often unable to fully leverage the experience accumulated by agents during the exploration of solutions in the reasoning process, leading to inefficiencies and suboptimal performance. To address this limitation, we propose ML-Master, a novel AI4AI agent that seamlessly integrates exploration and reasoning by employing a selectively scoped memory mechanism. This approach allows ML-Master to efficiently combine diverse insights from parallel solution trajectories with analytical reasoning, guiding further exploration without overwhelming the agent with excessive context. We evaluate ML-Master on the MLE-Bench, where it achieves a 29.3% average medal rate, significantly surpassing existing methods, particularly in medium-complexity tasks, while accomplishing this superior performance within a strict 12-hour time constraint-half the 24-hour limit used by previous baselines. These results demonstrate ML-Master's potential as a powerful tool for advancing AI4AI.

Related papers

SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam? [51.112225746095746]
We introduce X-Master, a tool-augmented reasoning agent designed to emulate human researchers.<n>X-Masters sets a new state-of-the-art record on Humanity's Last Exam with a score of 32.1%.
arXiv Detail & Related papers (2025-07-07T17:50:52Z)
AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing [8.195356684218691]
AI agents are autonomous systems designed to perceive, reason, and act within dynamic environments.<n>LLMs, MLLMs, and Agentic AI contribute to expanding AI's capabilities in information processing, environmental perception, and autonomous decision-making.<n>This study systematically reviews the evolution of AI and AI agent technologies.
arXiv Detail & Related papers (2025-07-02T05:31:17Z)
Scalable, Symbiotic, AI and Non-AI Agent Based Parallel Discrete Event Simulations [0.0]
This paper introduces a novel parallel discrete event simulation (PDES) based methodology to combine multiple AI and non-AI agents.<n>We evaluate our approach by solving four problems from four different domains and comparing the results with those from AI models alone.<n>Results show that overall accuracy of our approach is 68% where as the accuracy of vanilla models is less than 23%.
arXiv Detail & Related papers (2025-05-28T17:50:01Z)
Direct Advantage Regression: Aligning LLMs with Online AI Reward [59.78549819431632]
Online AI Feedback (OAIF) presents a promising alternative to Reinforcement Learning from Human Feedback (RLHF)<n>We propose Direct Advantage Regression (DAR) to optimize policy improvement through weighted supervised fine-tuning.<n>Our empirical results underscore that AI reward is a better form of AI supervision consistently achieving higher human-AI agreement as opposed to AI preference.
arXiv Detail & Related papers (2025-04-19T04:44:32Z)
General Scales Unlock AI Evaluation with Explanatory and Predictive Power [57.7995945974989]
benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems.<n>We introduce general scales for AI evaluation that can explain what common AI benchmarks really measure.<n>Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate.
arXiv Detail & Related papers (2025-03-09T01:13:56Z)
ML Research Benchmark [0.0]
We present the ML Research Benchmark (MLRB), comprising 7 competition-level tasks derived from recent machine learning conference tracks. This paper introduces a novel benchmark and evaluates it using agent scaffolds powered by frontier models, including Claude-3 and GPT-4o. The results indicate that the Claude-3.5 Sonnet agent performs best across our benchmark, excelling in planning and developing machine learning models.
arXiv Detail & Related papers (2024-10-29T21:38:42Z)
Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI. While this generative AI approach has produced impressive results, it heavily leans on human supervision. This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation. We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z)
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models [20.844806635710526]
GPT4AIGChip is a framework intended to democratize AI accelerator design by leveraging human natural languages.<n>This work is the first to demonstrate an effective pipeline for LLM-powered automated AI accelerator generation.
arXiv Detail & Related papers (2023-09-19T16:14:57Z)
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions. This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision. We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z)
Leveraging Rationales to Improve Human Task Performance [15.785125079811902]
Given a computational system's performance exceeds that of its human user, can explainable AI capabilities be leveraged to improve the performance of the human? We introduce the Rationale-Generating Algorithm, an automated technique for generating rationales for utility-based computational methods. Results show that our approach produces rationales that lead to statistically significant improvement in human task performance.
arXiv Detail & Related papers (2020-02-11T04:51:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.