Automating API Documentation from Crowdsourced Knowledge
- URL: http://arxiv.org/abs/2601.08036v1
- Date: Mon, 12 Jan 2026 22:06:20 GMT
- Title: Automating API Documentation from Crowdsourced Knowledge
- Authors: Bonan Kou, Zijie Zhou, Muhao Chen, Tianyi Zhang,
- Abstract summary: We propose a new approach called AutoDoc that generates API documents with API knowledge extracted from online discussions on Stack Overflow (SO)<n>Our results indicate that the API documents generated by AutoDoc are up to 77.7% more accurate, 9.5% less duplicated, and contain 34.4% knowledge uncovered by the official documents.
- Score: 27.13413474270422
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: API documentation is crucial for developers to learn and use APIs. However, it is known that many official API documents are obsolete and incomplete. To address this challenge, we propose a new approach called AutoDoc that generates API documents with API knowledge extracted from online discussions on Stack Overflow (SO). AutoDoc leverages a fine-tuned dense retrieval model to identify seven types of API knowledge from SO posts. Then, it uses GPT-4o to summarize the API knowledge in these posts into concise text. Meanwhile, we designed two specific components to handle LLM hallucination and redundancy in generated content. We evaluated AutoDoc against five comparison baselines on 48 APIs of different popularity levels. Our results indicate that the API documents generated by AutoDoc are up to 77.7% more accurate, 9.5% less duplicated, and contain 34.4% knowledge uncovered by the official documents. We also measured the sensitivity of AutoDoc to the choice of different LLMs. We found that while larger LLMs produce higher-quality API documents, AutoDoc enables smaller open-source models (e.g., Mistral-7B-v0.3) to achieve comparable results. Finally, we conducted a user study to evaluate the usefulness of the API documents generated by AutoDoc. All participants found API documents generated by AutoDoc to be more comprehensive, concise, and helpful than the comparison baselines. This highlights the feasibility of utilizing LLMs for API documentation with careful design to counter LLM hallucination and information redundancy.
Related papers
- LAPIS: Lightweight API Specification for Intelligent Systems [0.0]
Large Language Models (LLMs) increasingly serve as consumers of API specifications, whether for code generation, autonomous agent interaction, or API-assisted reasoning.<n>We present LAPIS, a domain-specific format optimized for LLM consumption that preserves the semantic information necessary for API token usage.
arXiv Detail & Related papers (2026-02-20T15:22:13Z) - Lightweight Model Editing for LLMs to Correct Deprecated API Recommendations [15.586818028794942]
Pre-trained Large Language Models (LLMs) have demonstrated strong performance in code completion tasks.<n>LLMs frequently generate deprecated APIs that will no longer be supported in future versions of third-party libraries.<n>We propose AdaLoRA-L, which defines "Common API Layers" (layers with high importance across all APIs, storing general knowledge and excluded from editing) and restricts edits exclusively to "Specific API Layers"<n> Experimental results demonstrate that AdaLoRA-L significantly improves Specificity while maintaining comparable performance across other evaluation metrics.
arXiv Detail & Related papers (2025-11-26T03:36:34Z) - OASBuilder: Generating OpenAPI Specifications from Online API Documentation with Large Language Models [10.54692787937075]
OASBuilder is a framework that transforms long and diverse API documentation pages into consistent, machine-readable API specifications.<n> OASBuilder has been successfully implemented in an enterprise environment, saving thousands of hours of manual effort.
arXiv Detail & Related papers (2025-07-07T14:36:13Z) - Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors.<n>Traditional fuzzing struggles with the complexity and API diversity of DL libraries.<n>We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z) - ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution.<n> Experimental results demonstrate that ExploraCoder significantly improves performance for models lacking prior API knowledge.
arXiv Detail & Related papers (2024-12-06T19:00:15Z) - A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models [14.665460257371164]
Large language models (LLMs) like GitHub Copilot and ChatGPT have emerged as powerful tools for code generation.
We propose AutoAPIEval, a framework designed to evaluate the capabilities of LLMs in API-oriented code generation.
arXiv Detail & Related papers (2024-09-23T17:22:09Z) - A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How [53.65636914757381]
API suggestion is a critical task in modern software development.
Recent advancements in large code models (LCMs) have shown promise in the API suggestion task.
arXiv Detail & Related papers (2024-09-20T03:12:35Z) - WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment [49.00213183302225]
We propose a framework to induce new APIs by grounding wikiHow instruction to situated agent policies.<n>Inspired by recent successes in large language models (LLMs) for embodied planning, we propose a few-shot prompting to steer GPT-4.
arXiv Detail & Related papers (2024-07-10T15:52:44Z) - SoAy: A Solution-based LLM API-using Methodology for Academic Information Seeking [59.59923482238048]
SoAy is a solution-based LLM API-using methodology for academic information seeking.<n>It uses code with a solution as the reasoning method, where a solution is a pre-constructed API calling sequence.<n>Results show a 34.58-75.99% performance improvement compared to state-of-the-art LLM API-based baselines.
arXiv Detail & Related papers (2024-05-24T02:44:14Z) - SpeCrawler: Generating OpenAPI Specifications from API Documentation
Using Large Language Models [8.372941103284774]
SpeCrawler is a comprehensive system that generates OpenAPI Specifications from diverse API documentation.
The paper explores SpeCrawler's methodology, supported by empirical evidence and case studies.
arXiv Detail & Related papers (2024-02-18T15:33:24Z) - You Can REST Now: Automated REST API Documentation and Testing via LLM-Assisted Request Mutations [8.158964648211002]
We present RESTSpecIT, the first automated approach that infers documentation and performs black-box testing of REST APIs.<n>Our approach requires minimal user input compared to state-of-the-art tools.<n>We evaluate the quality of our tool with three state-of-the-art LLMs: DeepSeek V3, GPT-4.1, and GPT-3.5.
arXiv Detail & Related papers (2024-02-07T18:55:41Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval.
We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English.
For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z) - API2Com: On the Improvement of Automatically Generated Code Comments
Using API Documentations [0.0]
We propose API2Com, a model that leverages the Application Programming Interface Documentations (API Docs) as a knowledge resource for comment generation.
We apply the model on a large Java dataset of over 130,000 methods and evaluate it using both Transformer and RNN-base architectures.
arXiv Detail & Related papers (2021-03-19T07:29:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.