The FM Agent
- URL: http://arxiv.org/abs/2510.26144v1
- Date: Thu, 30 Oct 2025 04:57:57 GMT
- Title: The FM Agent
- Authors: Annan Li, Chufan Wu, Zengle Ge, Yee Hin Chong, Zhinan Hou, Lizhe Cao, Cheng Ju, Jianmin Wu, Huaiming Li, Haobo Zhang, Shenghao Feng, Mo Zhao, Fengzhi Qiu, Rui Yang, Mengmeng Zhang, Wenyi Zhu, Yingying Sun, Quan Sun, Shunhao Yan, Danyu Liu, Dawei Yin, Dou Shen,
- Abstract summary: Large language models (LLMs) are catalyzing the development of autonomous AI research agents.<n>We present FM Agent, a novel and general-purpose multi-agent framework.<n>Our system has been evaluated across diverse domains, including operations research, machine learning, GPU kernel optimization, and classical mathematical problems.
- Score: 36.44000839818829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are catalyzing the development of autonomous AI research agents for scientific and engineering discovery. We present FM Agent, a novel and general-purpose multi-agent framework that leverages a synergistic combination of LLM-based reasoning and large-scale evolutionary search to address complex real-world challenges. The core of FM Agent integrates several key innovations: 1) a cold-start initialization phase incorporating expert guidance, 2) a novel evolutionary sampling strategy for iterative optimization, 3) domain-specific evaluators that combine correctness, effectiveness, and LLM-supervised feedback, and 4) a distributed, asynchronous execution infrastructure built on Ray. Demonstrating broad applicability, our system has been evaluated across diverse domains, including operations research, machine learning, GPU kernel optimization, and classical mathematical problems. FM Agent reaches state-of-the-art results autonomously, without human interpretation or tuning -- 1976.3 on ALE-Bench (+5.2\%), 43.56\% on MLE-Bench (+4.0pp), up to 20x speedups on KernelBench, and establishes new state-of-the-art(SOTA) results on several classical mathematical problems. Beyond academic benchmarks, FM Agent shows considerable promise for both large-scale enterprise R\&D workflows and fundamental scientific research, where it can accelerate innovation, automate complex discovery processes, and deliver substantial engineering and scientific advances with broader societal impact.
Related papers
- A Cloud-based Multi-Agentic Workflow for Science [0.12314765641075438]
Large Language Models (LLMs) become ubiquitous across various scientific domains.<n>Their lack of ability to perform complex tasks like running simulations or to make complex decisions limits their utility.<n>We present a domain-agnostic, model-independent workflow for an agentic framework that can act as a scientific assistant while being run entirely on cloud.
arXiv Detail & Related papers (2026-01-18T22:37:09Z) - Evaluating Novelty in AI-Generated Research Plans Using Multi-Workflow LLM Pipelines [1.3986052226424095]
This paper investigates whether agentic systems employing iterative reasoning, evolutionary search, and decomposition can generate more novel and feasible research plans.<n>We benchmark five reasoning architectures: Reflection-based iterative refinement, Sakana AI v2 evolutionary algorithms, Google Co-Scientist multi-agent framework, GPT Deep Research, and Gemini3 Pro multimodal long-context pipeline.<n>Results reveal varied performance across research domains, with high-performing maintaining feasibility without sacrificing creativity.
arXiv Detail & Related papers (2025-12-24T12:41:31Z) - An Agentic Framework for Autonomous Materials Computation [70.24472585135929]
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery.<n>Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific experiments.<n>Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations.
arXiv Detail & Related papers (2025-12-22T15:03:57Z) - Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning [16.309345268949745]
multimodal large reasoning models exhibit two key limitations.<n>They tend to employ computationally expensive reasoning even for simple queries, leading to inefficiency.<n>We propose Metis-HOME: a Hybrid Optimized Mixture-of-Experts framework.
arXiv Detail & Related papers (2025-10-23T13:02:49Z) - The Landscape of Agentic Reinforcement Learning for LLMs: A Survey [103.78397717362797]
The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL)<n>This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL.
arXiv Detail & Related papers (2025-09-02T17:46:26Z) - ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows [82.07367406991678]
Large Language Models (LLMs) have extended their impact beyond Natural Language Processing.<n>Among these, computer-using agents are capable of interacting with operating systems as humans do.<n>We introduce ScienceBoard, which encompasses a realistic, multi-domain environment featuring dynamic and visually rich scientific software.
arXiv Detail & Related papers (2025-05-26T12:27:27Z) - R&D-Agent: An LLM-Agent Framework Towards Autonomous Data Science [70.1638335489284]
High-level machine learning engineering tasks remain labor-intensive and iterative.<n>We introduce R&D-Agent, a comprehensive, decoupled, and framework that formalizes the machine learning process.<n>R&D-Agent defines the MLE into two phases and six components, turning agent design for MLE into a principled, testable process.
arXiv Detail & Related papers (2025-05-20T06:07:00Z) - Agentic Mixture-of-Workflows for Multi-Modal Chemical Search [0.0]
Large language models (LLMs) have demonstrated promising reasoning and automation capabilities across various domains.<n>We introduce CRAG-MoW - a novel paradigm that orchestrates multiple agentic employing distinct CRAG strategies.<n>We benchmark CRAG-MoWs across small molecules, polymers, and chemical reactions, as well as multi-modal nuclear magnetic resonance (NMR) spectral retrieval.
arXiv Detail & Related papers (2025-02-26T23:48:02Z) - MLGym: A New Framework and Benchmark for Advancing AI Research Agents [51.9387884953294]
We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing large language models on AI research tasks.<n>This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents.<n>We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro.
arXiv Detail & Related papers (2025-02-20T12:28:23Z) - Optimizing Collaboration of LLM based Agents for Finite Element Analysis [1.5039745292757671]
This paper investigates the interactions between multiple agents within Large Language Models (LLMs) in the context of programming and coding tasks.
We utilize the AutoGen framework to facilitate communication among agents, evaluating different configurations based on the success rates from 40 random runs for each setup.
arXiv Detail & Related papers (2024-08-23T23:11:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.