Towards Community-Driven Agents for Machine Learning Engineering
- URL: http://arxiv.org/abs/2506.20640v1
- Date: Wed, 25 Jun 2025 17:36:02 GMT
- Title: Towards Community-Driven Agents for Machine Learning Engineering
- Authors: Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang,
- Abstract summary: CoMind is a novel agent that excels at exchanging insights and developing novel solutions within a community context.<n>MLE-Live is a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community.<n>CoMind achieves state-of-the-art performance on MLE-Live and outperforms 79.2% human competitors on average across four ongoing Kaggle competitions.
- Score: 39.8181056501734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language model-based machine learning (ML) agents have shown great promise in automating ML research. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, where human researchers often gain insights and contribute by sharing knowledge. To bridge this gap, we introduce MLE-Live, a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community. Building on this framework, we propose CoMind, a novel agent that excels at exchanging insights and developing novel solutions within a community context. CoMind achieves state-of-the-art performance on MLE-Live and outperforms 79.2% human competitors on average across four ongoing Kaggle competitions. Our code is released at https://github.com/comind-ml/CoMind.
Related papers
- AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench [65.21702462691933]
We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators.<n>Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6% to 47.7%.
arXiv Detail & Related papers (2025-07-03T11:59:15Z) - MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? [64.62421656031128]
MLRC-Bench is a benchmark designed to quantify how effectively language agents can tackle challenging Machine Learning (ML) Research Competitions.<n>Unlike prior work, MLRC-Bench measures the key steps of proposing and implementing novel research methods.<n>Even the best-performing tested agent closes only 9.3% of the gap between baseline and top human participant scores.
arXiv Detail & Related papers (2025-04-13T19:35:43Z) - Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents [16.422314548224147]
Large language models (LLMs) based agent systems have made great strides in real-world applications beyond traditional NLP tasks.<n>This paper proposes a new benchmark, Collab-Overcooked, built on the popular Overcooked-AI game with more applicable and challenging tasks in interactive environments.
arXiv Detail & Related papers (2025-02-27T13:31:13Z) - Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment [4.406086834602686]
We show how to reformulate credit assignment to the two pattern recognition problems of sequence improvement and attribution.<n>Our approach utilizes a centralized reward-critic which numerically decomposes the environment reward based on the individual contribution of each agent.<n>Both our methods far outperform the state-of-the-art on a variety of benchmarks, including Level-Based Foraging, Robotic Warehouse, and our new Spaceworld benchmark which incorporates collision-related safety constraints.
arXiv Detail & Related papers (2025-02-24T05:56:47Z) - Multi-Agent Collaboration Mechanisms: A Survey of LLMs [6.545098975181273]
Multi-Agent Systems (MASs) enable groups of intelligent agents to coordinate and solve complex tasks collectively.<n>This work provides an extensive survey of the collaborative aspect of MASs and introduces a framework to guide future research.
arXiv Detail & Related papers (2025-01-10T19:56:50Z) - Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents [101.17919953243107]
GovSim is a generative simulation platform designed to study strategic interactions and cooperative decision-making in large language models (LLMs)<n>We find that all but the most powerful LLM agents fail to achieve a sustainable equilibrium in GovSim, with the highest survival rate below 54%.<n>We show that agents that leverage "Universalization"-based reasoning, a theory of moral thinking, are able to achieve significantly better sustainability.
arXiv Detail & Related papers (2024-04-25T15:59:16Z) - Large Language Model-based Human-Agent Collaboration for Complex Task
Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving.
We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC.
This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z) - CompeteAI: Understanding the Competition Dynamics in Large Language Model-based Agents [43.46476421809271]
Large language models (LLMs) have been widely used as agents to complete different tasks.
We propose a general framework for studying the competition between agents.
We then implement a practical competitive environment using GPT-4 to simulate a virtual town.
arXiv Detail & Related papers (2023-10-26T16:06:20Z) - Building Cooperative Embodied Agents Modularly with Large Language
Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments.
We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework.
Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z) - CAMEL: Communicative Agents for "Mind" Exploration of Large Language
Model Society [58.04479313658851]
This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents.
We propose a novel communicative agent framework named role-playing.
Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems.
arXiv Detail & Related papers (2023-03-31T01:09:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.