Automating API Documentation with LLMs: A BERTopic Approach
- URL: http://arxiv.org/abs/2509.05749v1
- Date: Sat, 06 Sep 2025 15:41:46 GMT
- Title: Automating API Documentation with LLMs: A BERTopic Approach
- Authors: AmirHossein Naghshzan,
- Abstract summary: Using BERTopic, we extracted prevalent topics from 3.6 million Stack Overflow posts and applied extractive summarization techniques to generate concise summaries.<n>A user study with 30 Android developers assessed the summaries for coherence, relevance, informativeness, and satisfaction, showing improved productivity.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developers rely on API documentation, but official sources are often lengthy, complex, or incomplete. Many turn to community-driven forums like Stack Overflow for practical insights. We propose automating the summarization of informal sources, focusing on Android APIs. Using BERTopic, we extracted prevalent topics from 3.6 million Stack Overflow posts and applied extractive summarization techniques to generate concise summaries, including code snippets. A user study with 30 Android developers assessed the summaries for coherence, relevance, informativeness, and satisfaction, showing improved productivity. Integrating formal API knowledge with community-generated content enhances documentation, making API resources more accessible and actionable work.
Related papers
- SocialED: A Python Library for Social Event Detection [53.928241775629566]
SocialED is a comprehensive, open-source Python library designed to support social event detection (SED) tasks.<n>It provides a unified API with detailed documentation, offering researchers and practitioners a complete solution for event detection in social media.<n>SocialED supports a wide range of preprocessing techniques, such as graph construction and tokenization, and includes standardized interfaces for training models and making predictions.
arXiv Detail & Related papers (2024-12-18T03:37:47Z) - ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution.<n> Experimental results demonstrate that ExploraCoder significantly improves performance for models lacking prior API knowledge.
arXiv Detail & Related papers (2024-12-06T19:00:15Z) - A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How [53.65636914757381]
API suggestion is a critical task in modern software development.
Recent advancements in large code models (LCMs) have shown promise in the API suggestion task.
arXiv Detail & Related papers (2024-09-20T03:12:35Z) - SoAy: A Solution-based LLM API-using Methodology for Academic Information Seeking [59.59923482238048]
SoAy is a solution-based LLM API-using methodology for academic information seeking.<n>It uses code with a solution as the reasoning method, where a solution is a pre-constructed API calling sequence.<n>Results show a 34.58-75.99% performance improvement compared to state-of-the-art LLM API-based baselines.
arXiv Detail & Related papers (2024-05-24T02:44:14Z) - Revolutionizing API Documentation through Summarization [0.0]
API documentation can be lengthy and challenging to navigate, prompting developers to seek unofficial sources such as Stack Overflow.
We employ BERTopic and extractive summarization to automatically generate concise and informative API summaries.
These summaries encompass key insights like general usage, common developer issues, and potential solutions, sourced from the wealth of knowledge on Stack Overflow.
arXiv Detail & Related papers (2024-01-21T01:18:08Z) - Leveraging Deep Learning for Abstractive Code Summarization of
Unofficial Documentation [1.1816942730023887]
This paper proposes an automatic approach using the BART algorithm to generate summaries for APIs discussed in StackOverflow.
We built an oracle of human-generated summaries to evaluate our approach against it using ROUGE and BLEU metrics.
Our findings demonstrate that using deep learning algorithms can improve summaries' quality and outperform the previous work by an average of %57 for Precision.
arXiv Detail & Related papers (2023-10-23T15:10:37Z) - Enhancing API Documentation through BERTopic Modeling and Summarization [0.0]
This paper focuses on the complexities of interpreting Application Programming Interface (API) documentation.
Official API documentation serves as a primary source of information for developers, but it can often be extensive and lacks user-friendliness.
Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation.
arXiv Detail & Related papers (2023-08-17T15:57:12Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval.
We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English.
For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z) - Towards Code Summarization of APIs Based on Unofficial Documentation
Using NLP Techniques [0.0]
In some cases, official documentation is not an efficient way to get the needed information.
We propose an automatic approach to generate summaries for APIs and methods by leveraging unofficial documentation using NLP techniques.
arXiv Detail & Related papers (2022-08-12T15:07:30Z) - Leveraging Unsupervised Learning to Summarize APIs Discussed in Stack
Overflow [1.8047694351309207]
This paper proposes an automatic and novel approach for summarizing Android API methods discussed in Stack Overflow.
Our approach takes the API method's name as an input and generates a natural language summary based on Stack Overflow discussions of that API method.
We have conducted a survey that involves 16 Android developers to evaluate the quality of our automatically generated summaries and compare them with the official Android documentation.
arXiv Detail & Related papers (2021-11-27T18:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.