OASBuilder: Generating OpenAPI Specifications from Online API Documentation with Large Language Models
- URL: http://arxiv.org/abs/2507.05316v1
- Date: Mon, 07 Jul 2025 14:36:13 GMT
- Title: OASBuilder: Generating OpenAPI Specifications from Online API Documentation with Large Language Models
- Authors: Koren Lazar, Matan Vetzler, Kiran Kate, Jason Tsay, David Boaz Himanshu Gupta, Avraham Shinnar, Rohith D Vallam, David Amid Esther Goldbraich, Guy Uziel, Jim Laredo, Ateret Anaby Tavor,
- Abstract summary: OASBuilder is a framework that transforms long and diverse API documentation pages into consistent, machine-readable API specifications.<n> OASBuilder has been successfully implemented in an enterprise environment, saving thousands of hours of manual effort.
- Score: 10.54692787937075
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: AI agents and business automation tools interacting with external web services require standardized, machine-readable information about their APIs in the form of API specifications. However, the information about APIs available online is often presented as unstructured, free-form HTML documentation, requiring external users to spend significant time manually converting it into a structured format. To address this, we introduce OASBuilder, a novel framework that transforms long and diverse API documentation pages into consistent, machine-readable API specifications. This is achieved through a carefully crafted pipeline that integrates large language models and rule-based algorithms which are guided by domain knowledge of the structure of documentation webpages. Our experiments demonstrate that OASBuilder generalizes well across hundreds of APIs, and produces valid OpenAPI specifications that encapsulate most of the information from the original documentation. OASBuilder has been successfully implemented in an enterprise environment, saving thousands of hours of manual effort and making hundreds of complex enterprise APIs accessible as tools for LLMs.
Related papers
- Doc2Agent: Scalable Generation of Tool-Using Agents from API Documentation [2.4117201298131232]
Doc2Agent is a scalable pipeline to build tool agents that can call Python-based tools generated from API documentation.<n>We evaluate our approach on real-world APIs, WebArena APIs, and research APIs, producing validated tools.
arXiv Detail & Related papers (2025-06-24T20:30:44Z) - ToolFactory: Automating Tool Generation by Leveraging LLM to Understand REST API Documentations [4.934192277899036]
API documentation often suffers from a lack of standardization, inconsistent schemas, and incomplete information.<n>We developed textbfToolFactory, an open-source pipeline for automating tool generation from unstructured API documents.<n>We also demonstrated ToolFactory by creating a domain-specific AI agent for glycomaterials research.
arXiv Detail & Related papers (2025-01-28T13:42:33Z) - ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution.<n> Experimental results demonstrate that ExploraCoder significantly improves performance for models lacking prior API knowledge.
arXiv Detail & Related papers (2024-12-06T19:00:15Z) - A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How [53.65636914757381]
API suggestion is a critical task in modern software development.
Recent advancements in large code models (LCMs) have shown promise in the API suggestion task.
arXiv Detail & Related papers (2024-09-20T03:12:35Z) - FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking [57.53742155914176]
API call generation is the cornerstone of large language models' tool-using ability.
Existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request.
We propose an output-side optimization approach called FANTASE to address these limitations.
arXiv Detail & Related papers (2024-07-18T23:44:02Z) - WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment [49.00213183302225]
We propose a framework to induce new APIs by grounding wikiHow instruction to situated agent policies.<n>Inspired by recent successes in large language models (LLMs) for embodied planning, we propose a few-shot prompting to steer GPT-4.
arXiv Detail & Related papers (2024-07-10T15:52:44Z) - SpeCrawler: Generating OpenAPI Specifications from API Documentation
Using Large Language Models [8.372941103284774]
SpeCrawler is a comprehensive system that generates OpenAPI Specifications from diverse API documentation.
The paper explores SpeCrawler's methodology, supported by empirical evidence and case studies.
arXiv Detail & Related papers (2024-02-18T15:33:24Z) - Enhancing API Documentation through BERTopic Modeling and Summarization [0.0]
This paper focuses on the complexities of interpreting Application Programming Interface (API) documentation.
Official API documentation serves as a primary source of information for developers, but it can often be extensive and lacks user-friendliness.
Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation.
arXiv Detail & Related papers (2023-08-17T15:57:12Z) - ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world
APIs [104.37772295581088]
Open-source large language models (LLMs), e.g., LLaMA, remain significantly limited in tool-use capabilities.
We introduce ToolLLM, a general tool-usetuning encompassing data construction, model training, and evaluation.
We first present ToolBench, an instruction-tuning framework for tool use, which is constructed automatically using ChatGPT.
arXiv Detail & Related papers (2023-07-31T15:56:53Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval.
We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English.
For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z) - API-Miner: an API-to-API Specification Recommendation Engine [1.8352113484137629]
API-Miner is an API-to-API specification recommendation engine.
It retrieves relevant specification components written in OpenAPI.
We evaluate API-Miner in both quantitative and qualitative tasks.
arXiv Detail & Related papers (2022-12-14T14:43:51Z) - OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge
Graphs [0.26107298043931193]
Ontology engineers, who populate and create knowledge graphs, and web developers need to understand, access and query these knowledge graphs but are not familiar with APIs, RDF or SPARQL.
In this paper we describe the Ontology-Based API framework (OBA) to automatically create REST APIs from familiar web developers.
We showcase OBA with three examples that illustrate the capabilities of the framework.
arXiv Detail & Related papers (2020-07-17T19:46:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.