Fugu-MT 論文翻訳(概要): TheMCPCompany: Creating General-purpose Agents with Task-specific Tools

論文の概要: TheMCPCompany: Creating General-purpose Agents with Task-specific Tools

arxiv url: http://arxiv.org/abs/2510.19286v1
Date: Wed, 22 Oct 2025 06:42:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:15.221207
Title: TheMCPCompany: Creating General-purpose Agents with Task-specific Tools
Title（参考訳）: TheMCPCompany:タスク固有のツールによる汎用エージェントの作成
Authors: Reza Esfandiarpoor, Vishwas Suryanarayanan, Stephen H. Bach, Vishal Chowdhary, Anthony Aue,
Abstract要約: TheMCPCompanyは、様々な現実世界のサービスと対話するタスクにおいて、ツールコールエージェントを評価するためのベンチマークである。また、各タスクに手動でアノテートされた接地木ツールも提供します。全体として、我々の研究は、最も高度な推論モデルは、より単純な環境でツールを見つけるのに効果的であるが、複雑なエンタープライズ環境をナビゲートするのに深刻な苦労をしていることを示している。
参考スコア（独自算出の注目度）: 12.249551019598442
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Since the introduction of the Model Context Protocol (MCP), the number of available tools for Large Language Models (LLMs) has increased significantly. These task-specific tool sets offer an alternative to general-purpose tools such as web browsers, while being easier to develop and maintain than GUIs. However, current general-purpose agents predominantly rely on web browsers for interacting with the environment. Here, we introduce TheMCPCompany, a benchmark for evaluating tool-calling agents on tasks that involve interacting with various real-world services. We use the REST APIs of these services to create MCP servers, which include over 18,000 tools. We also provide manually annotated ground-truth tools for each task. In our experiments, we use the ground truth tools to show the potential of tool-calling agents for both improving performance and reducing costs assuming perfect tool retrieval. Next, we explore agent performance using tool retrieval to study the real-world practicality of tool-based agents. While all models with tool retrieval perform similarly or better than browser-based agents, smaller models cannot take full advantage of the available tools through retrieval. On the other hand, GPT-5's performance with tool retrieval is very close to its performance with ground-truth tools. Overall, our work shows that the most advanced reasoning models are effective at discovering tools in simpler environments, but seriously struggle with navigating complex enterprise environments. TheMCPCompany reveals that navigating tens of thousands of tools and combining them in non-trivial ways to solve complex problems is still a challenging task for current models and requires both better reasoning and better retrieval models.
Abstract（参考訳）: モデルコンテキストプロトコル(MCP)が導入されて以来、LLM(Large Language Models)の利用可能なツールの数は大幅に増加している。これらのタスク固有のツールセットは、Webブラウザのような汎用ツールに代わるもので、GUIよりも開発や保守が容易である。しかし、現在の汎用エージェントは、主に環境と対話するためにWebブラウザに依存している。本稿では,様々な現実世界のサービスとのインタラクションに関わるタスクに対して,ツールコールエージェントを評価するためのベンチマークであるTheMCPCompanyを紹介する。私たちは、これらのサービスのREST APIを使用して、18,000以上のツールを含むMPPサーバを作成しています。また、各タスクに手動でアノテートされた接地木ツールも提供します。実験では,ツールコールエージェントが,ツール検索に最適であるとして,性能向上とコスト削減の両面において有益であることを示すために,基礎的真理ツールを用いた。次に,ツール検索を用いたエージェントの性能調査を行い,ツールベースエージェントの現実的実用性について検討する。ツール検索のすべてのモデルはブラウザベースのエージェントと同等かそれ以上に動作するが、より小さなモデルは検索によって利用可能なツールを完全に活用することはできない。一方, GPT-5のツール検索性能は, グラウンドトルースツールの性能に非常に近い。全体として、我々の研究は、最も高度な推論モデルは、より単純な環境でツールを見つけるのに効果的であるが、複雑なエンタープライズ環境をナビゲートするのに深刻な苦労をしていることを示している。 TheMCPCompanyは、複雑な問題を解決するために数万のツールをナビゲートし、それらを非自明な方法で組み合わせることは、現在のモデルにとって依然として難しい課題であり、より良い推論とより良い検索モデルの両方を必要としていることを明かしている。

論文の概要: TheMCPCompany: Creating General-purpose Agents with Task-specific Tools

関連論文リスト