Related papers: Towards Community-Driven Agents for Machine Learning Engineering

Towards Community-Driven Agents for Machine Learning Engineering

URL: http://arxiv.org/abs/2506.20640v1
Date: Wed, 25 Jun 2025 17:36:02 GMT
Title: Towards Community-Driven Agents for Machine Learning Engineering
Authors: Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang,
Abstract summary: CoMind is a novel agent that excels at exchanging insights and developing novel solutions within a community context.<n>MLE-Live is a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community.<n>CoMind achieves state-of-the-art performance on MLE-Live and outperforms 79.2% human competitors on average across four ongoing Kaggle competitions.
Score: 39.8181056501734
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model-based machine learning (ML) agents have shown great promise in automating ML research. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, where human researchers often gain insights and contribute by sharing knowledge. To bridge this gap, we introduce MLE-Live, a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community. Building on this framework, we propose CoMind, a novel agent that excels at exchanging insights and developing novel solutions within a community context. CoMind achieves state-of-the-art performance on MLE-Live and outperforms 79.2% human competitors on average across four ongoing Kaggle competitions. Our code is released at https://github.com/comind-ml/CoMind.

Related papers

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench [65.21702462691933]
We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators.<n>Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6% to 47.7%.
arXiv Detail & Related papers (2025-07-03T11:59:15Z)
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? [64.62421656031128]
MLRC-Bench is a benchmark designed to quantify how effectively language agents can tackle challenging Machine Learning (ML) Research Competitions.<n>Unlike prior work, MLRC-Bench measures the key steps of proposing and implementing novel research methods.<n>Even the best-performing tested agent closes only 9.3% of the gap between baseline and top human participant scores.
arXiv Detail & Related papers (2025-04-13T19:35:43Z)
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents [16.422314548224147]
Large language models (LLMs) based agent systems have made great strides in real-world applications beyond traditional NLP tasks.<n>This paper proposes a new benchmark, Collab-Overcooked, built on the popular Overcooked-AI game with more applicable and challenging tasks in interactive environments.
arXiv Detail & Related papers (2025-02-27T13:31:13Z)
Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment [4.406086834602686]
We show how to reformulate credit assignment to the two pattern recognition problems of sequence improvement and attribution.<n>Our approach utilizes a centralized reward-critic which numerically decomposes the environment reward based on the individual contribution of each agent.<n>Both our methods far outperform the state-of-the-art on a variety of benchmarks, including Level-Based Foraging, Robotic Warehouse, and our new Spaceworld benchmark which incorporates collision-related safety constraints.
arXiv Detail & Related papers (2025-02-24T05:56:47Z)
Multi-Agent Collaboration Mechanisms: A Survey of LLMs [6.545098975181273]
Multi-Agent Systems (MASs) enable groups of intelligent agents to coordinate and solve complex tasks collectively.<n>This work provides an extensive survey of the collaborative aspect of MASs and introduces a framework to guide future research.
arXiv Detail & Related papers (2025-01-10T19:56:50Z)
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents [101.17919953243107]
GovSim is a generative simulation platform designed to study strategic interactions and cooperative decision-making in large language models (LLMs)<n>We find that all but the most powerful LLM agents fail to achieve a sustainable equilibrium in GovSim, with the highest survival rate below 54%.<n>We show that agents that leverage "Universalization"-based reasoning, a theory of moral thinking, are able to achieve significantly better sustainability.
arXiv Detail & Related papers (2024-04-25T15:59:16Z)
Large Language Model-based Human-Agent Collaboration for Complex Task Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving. We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC. This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z)
CompeteAI: Understanding the Competition Dynamics in Large Language Model-based Agents [43.46476421809271]
Large language models (LLMs) have been widely used as agents to complete different tasks. We propose a general framework for studying the competition between agents. We then implement a practical competitive environment using GPT-4 to simulate a virtual town.
arXiv Detail & Related papers (2023-10-26T16:06:20Z)
Building Cooperative Embodied Agents Modularly with Large Language Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework. Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society [58.04479313658851]
This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents. We propose a novel communicative agent framework named role-playing. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems.
arXiv Detail & Related papers (2023-03-31T01:09:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.