Related papers: MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools

MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools

URL: http://arxiv.org/abs/2510.24284v2
Date: Sat, 01 Nov 2025 07:07:32 GMT
Title: MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
Authors: Wenhao Wang, Peizhi Niu, Zhao Xu, Zhaoyu Chen, Jian Du, Yaxin Du, Xianghe Pang, Keduan Huang, Yanfeng Wang, Qiang Yan, Siheng Chen,
Abstract summary: Large Language Models increasingly rely on external tools to perform complex, realistic tasks.<n>Existing MCP research covers few servers, depends on costly manual curation, and lacks training support.<n>We introduce MCP-Flow, an automated web-agent-driven pipeline for large-scale server discovery, data synthesis, and model training.
Score: 58.5971352939562
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) increasingly rely on external tools to perform complex, realistic tasks, yet their ability to utilize the rapidly expanding Model Contextual Protocol (MCP) ecosystem remains limited. Existing MCP research covers few servers, depends on costly manual curation, and lacks training support, hindering progress toward real-world deployment. To overcome these limitations, we introduce MCP-Flow, an automated web-agent-driven pipeline for large-scale server discovery, data synthesis, and model training. MCP-Flow collects and filters data from 1166 servers and 11536 tools, producing 68733 high-quality instruction-function call pairs and 6439 trajectories, far exceeding prior work in scale and diversity. Extensive experiments demonstrate MCP-Flow's effectiveness in driving superior MCP tool selection, function-call generation, and enhanced agentic task performance. MCP-Flow thus provides a scalable foundation for advancing LLM agents' proficiency in real-world MCP environments. MCP-Flow is publicly available at \href{https://github.com/wwh0411/MCP-Flow}{https://github.com/wwh0411/MCP-Flow}.

Related papers

MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era [74.42509044145417]
MegaFlow is a large-scale distributed orchestration system that enables efficient scheduling, resource allocation, and fine-grained task management for agent-environment workloads.<n>In our agent training deployments, MegaFlow successfully orchestrates tens of thousands of concurrent agent tasks while maintaining high system stability and achieving efficient resource utilization.
arXiv Detail & Related papers (2026-01-12T13:25:33Z)
Network and Systems Performance Characterization of MCP-Enabled LLM Agents [2.952262068394116]
Model Context Protocol (MCP) has recently gained increased attention within the AI community for providing a standardized way for large language models (LLMs) to interact with external tools and services.<n>This paper presents a measurement-based analysis of MCP-enabled interactions with LLMs, revealing trade-offs between capability, performance, and cost.
arXiv Detail & Related papers (2025-10-20T05:13:47Z)
RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use [50.52940111891476]
Large language models excel at basic reasoning but struggle with tasks that require interaction with external tools.<n>We present RLFactory, a plug-and-play reinforcement learning framework for multi-round tool use.
arXiv Detail & Related papers (2025-08-31T16:47:31Z)
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers [24.6512259539754]
MCP-Bench is a benchmark for evaluating large language models (LLMs) on realistic, multi-step tasks.<n>Built on the Model Context Protocol (MCP), MCP-Bench connects LLMs to 28 representative live MCP servers spanning 250 tools across domains such as finance, traveling, scientific computing, and academic search.
arXiv Detail & Related papers (2025-08-28T05:58:57Z)
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries [38.56775962026289]
We present LiveMCP-101, a benchmark of 101 carefully curated real-world queries.<n>Experiments show that even frontier LLMs achieve a success rate below 60%.<n>LiveMCP-101 sets a rigorous standard for evaluating real-world agent capabilities.
arXiv Detail & Related papers (2025-08-21T17:55:54Z)
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? [50.60770039016318]
We present LiveMCPBench, the first comprehensive benchmark for benchmarking Model Context Protocol (MCP) agents.<n>LiveMCPBench consists of 95 real-world tasks grounded in the MCP ecosystem.<n>Our evaluation covers 10 leading models, with the best-performing model reaching a 78.95% success rate.
arXiv Detail & Related papers (2025-08-03T14:36:42Z)
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models [76.72220653705679]
We introduce MCPEval, an open-source framework that automates end-to-end task generation and deep evaluation of intelligent agents.<n> MCPEval standardizes metrics, seamlessly integrates with native agent tools, and eliminates manual effort in building evaluation pipelines.<n> Empirical results across five real-world domains show its effectiveness in revealing nuanced, domain-specific performance.
arXiv Detail & Related papers (2025-07-17T05:46:27Z)
Beyond Formal Semantics for Capabilities and Skills: Model Context Protocol in Manufacturing [0.12289361708127876]
We present an alternative approach based on the recently introduced Model Context Protocol (MCP)<n>MCP allows systems to expose functionality through a standardized interface that is directly consumable by LLM-based agents.
arXiv Detail & Related papers (2025-06-12T13:02:16Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
MLOps: A Step Forward to Enterprise Machine Learning [0.0]
This research presents a detailed review of MLOps, its benefits, difficulties, evolutions, and important underlying technologies. The MLOps workflow is explained in detail along with the various tools necessary for both model and data exploration and deployment. This article also puts light on the end-to-end production of ML projects using various maturity levels of automated pipelines.
arXiv Detail & Related papers (2023-05-27T20:44:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.