Related papers: Making REST APIs Agent-Ready: From OpenAPI to Model Context Protocol Servers for Tool-Augmented LLMs

Making REST APIs Agent-Ready: From OpenAPI to Model Context Protocol Servers for Tool-Augmented LLMs

URL: http://arxiv.org/abs/2507.16044v2
Date: Wed, 23 Jul 2025 16:37:47 GMT
Title: Making REST APIs Agent-Ready: From OpenAPI to Model Context Protocol Servers for Tool-Augmented LLMs
Authors: Meriem Mastouri, Emna Ksontini, Wael Kessentini,
Abstract summary: We present AutoMCP, a compiler that generates MCP servers from OpenAPI 2.0/3.0 specifications.<n>We evaluate AutoMCP on 50 real-world APIs spanning 5,066 endpoints across over 10 domains.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are evolving from passive text generators into active agents that invoke external tools. To support this shift, scalable protocols for tool integration are essential. The Model Context Protocol (MCP), introduced by Anthropic in 2024, offers a schema-driven standard for dynamic tool discovery and invocation. Yet, building MCP servers remains manual and repetitive, requiring developers to write glue code, handle authentication, and configure schemas by hand-replicating much of the integration effort MCP aims to eliminate. This paper investigates whether MCP server construction can be meaningfully automated. We begin by analyzing adoption trends: among 22,000+ MCP-tagged GitHub repositories created within six months of release, fewer than 5% include servers, typically small, single-maintainer projects dominated by repetitive scaffolding. To address this gap, we present AutoMCP, a compiler that generates MCP servers from OpenAPI 2.0/3.0 specifications. AutoMCP parses REST API definitions and produces complete server implementations, including schema registration and authentication handling. We evaluate AutoMCP on 50 real-world APIs spanning 5,066 endpoints across over 10 domains. From a stratified sample of 1,023 tool calls, 76.5% succeeded out of the box. Manual failure analysis revealed five recurring issues, all attributable to inconsistencies or omissions in the OpenAPI contracts. After minor fixes, averaging 19 lines of spec changes per API, AutoMCP achieved 99.9% success. Our findings (i) analyze MCP adoption and quantify the cost of manual server development, (ii) demonstrate that OpenAPI specifications, despite quality issues, enable near-complete MCP server automation, and (iii) contribute a corpus of 5,066 callable tools along with insights on repairing common specification flaws.

Related papers

LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? [50.60770039016318]
We present LiveMCPBench, the first comprehensive benchmark for benchmarking Model Context Protocol (MCP) agents.<n>LiveMCPBench consists of 95 real-world tasks grounded in the MCP ecosystem.<n>Our evaluation covers 10 leading models, with the best-performing model reaching a 78.95% success rate.
arXiv Detail & Related papers (2025-08-03T14:36:42Z)
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models [76.72220653705679]
Existing methods rely on static benchmarks and labor-intensive data collection, limiting practical assessment.<n>We introduce oursystemname, an open-source Model Context Protocol (MCP)-based framework.<n> MCPEval standardizes metrics, seamlessly integrates with native agent tools, and eliminates manual effort in building evaluation pipelines.
arXiv Detail & Related papers (2025-07-17T05:46:27Z)
A Large-Scale Evolvable Dataset for Model Context Protocol Ecosystem and Security Analysis [8.943261888363622]
We introduce MCPCorpus, a large-scale dataset containing around 14K MCP servers and 300 MCP clients.<n>Each artifact is annotated with 20+ normalized attributes capturing its identity, interface configuration, GitHub activity, and metadata.<n> MCPCorpus provides a reproducible snapshot of the real-world MCP ecosystem, enabling studies of adoption trends, ecosystem health, and implementation diversity.
arXiv Detail & Related papers (2025-06-30T02:37:27Z)
MCPWorld: A Unified Benchmarking Testbed for API, GUI, and Hybrid Computer Use Agents [14.736516215309768]
We propose MCPWorld, the first automatic CUA testbed for API, GUI, and API-GUI hybrid agents.<n>A key principle of MCPWorld is the use of "white-box apps", i.e., those with source code availability and can be revised/re-compiled as needed.<n> MCPWorld includes 201 well curated and annotated user tasks, covering diversified use cases and difficulty levels.
arXiv Detail & Related papers (2025-06-09T11:50:33Z)
MCP-Zero: Active Tool Discovery for Autonomous LLM Agents [13.005899769943442]
We introduce MCP-Zero, an active agent framework that restores tool discovery autonomy to LLMs themselves.<n>Instead of overwhelming models with all available tools, MCP-Zero enables agents to actively identify capability gaps, and request specific tools on-demand.<n>We construct MCP-tools, a comprehensive dataset of 308 MCP servers and 2,797 tools from the official Model-Context-Protocol repository.
arXiv Detail & Related papers (2025-06-01T15:48:53Z)
ScaleMCP: Dynamic and Auto-Synchronizing Model Context Protocol Tools for LLM Agents [1.7217813564531652]
ScaleMCP is a novel tool selection approach that dynamically equips agents with a MCP tool retriever.<n>It gives agents the autonomy to add tools into their memory, as well as an auto-synchronizing tool storage system pipeline.<n> Comprehensive evaluations conducted on a created dataset of 5,000 financial metric MCP servers, demonstrate substantial improvements in tool retrieval and agent invocation performance.
arXiv Detail & Related papers (2025-05-09T20:30:37Z)
A Framework for Testing and Adapting REST APIs as LLM Tools [5.758488787763118]
We present a novel testing framework aimed at evaluating and enhancing the readiness of REST APIs to function as tools for agents.<n>Our framework transforms apis as tools, generates comprehensive test cases for the APIs, tests cases into natural language instructions and evaluates the agent's ability t correctly invoke the API and process its inputs and responses.
arXiv Detail & Related papers (2025-04-22T02:52:08Z)
SOPBench: Evaluating Language Agents at Following Standard Operating Procedures and Constraints [59.645885492637845]
SOPBench is an evaluation pipeline that transforms each service-specific SOP code program into a directed graph of executable functions.<n>Our approach transforms each service-specific SOP code program into a directed graph of executable functions and requires agents to call these functions based on natural language SOP descriptions.<n>We evaluate 18 leading models, and results show the task is challenging even for top-tier models.
arXiv Detail & Related papers (2025-03-11T17:53:02Z)
A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs [46.65963514391019]
We present AutoRestTest, the first black-box tool to adopt a dependency-embedded multi-agent approach for REST API testing.<n>Our approach treats REST API testing as a separable problem, where four agents collaborate to optimize API exploration.<n>Our evaluation of AutoRestTest on 12 real-world REST services shows that it outperforms the four leading black-box REST API testing tools.
arXiv Detail & Related papers (2024-11-11T16:20:27Z)
Model-driven realization of IDTA submodel specifications: The good, the bad, the incompatible? [49.60138105915326]
Asset Administration Shells are trending in Industry 4.0. In February 2024, the Industrial Digital Twin Association announced 84 and released 18 AAS submodel specifications. We present a model-driven approach, which transforms extracted information from IDTA specifications into an intermediary meta-model and, from there, generates API code and tests.
arXiv Detail & Related papers (2024-06-20T16:33:46Z)
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs [104.37772295581088]
Open-source large language models (LLMs), e.g., LLaMA, remain significantly limited in tool-use capabilities. We introduce ToolLLM, a general tool-usetuning encompassing data construction, model training, and evaluation. We first present ToolBench, an instruction-tuning framework for tool use, which is constructed automatically using ChatGPT.
arXiv Detail & Related papers (2023-07-31T15:56:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.