Leveraging Deep Learning for Abstractive Code Summarization of
Unofficial Documentation
- URL: http://arxiv.org/abs/2310.15015v4
- Date: Sun, 3 Dec 2023 20:11:11 GMT
- Title: Leveraging Deep Learning for Abstractive Code Summarization of
Unofficial Documentation
- Authors: AmirHossein Naghshzan, Latifa Guerrouj, Olga Baysal
- Abstract summary: This paper proposes an automatic approach using the BART algorithm to generate summaries for APIs discussed in StackOverflow.
We built an oracle of human-generated summaries to evaluate our approach against it using ROUGE and BLEU metrics.
Our findings demonstrate that using deep learning algorithms can improve summaries' quality and outperform the previous work by an average of %57 for Precision.
- Score: 1.1816942730023887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Usually, programming languages have official documentation to guide
developers with APIs, methods, and classes. However, researchers identified
insufficient or inadequate documentation examples and flaws with the API's
complex structure as barriers to learning an API. As a result, developers may
consult other sources (StackOverflow, GitHub, etc.) to learn more about an API.
Recent research studies have shown that unofficial documentation is a valuable
source of information for generating code summaries. We, therefore, have been
motivated to leverage such a type of documentation along with deep learning
techniques towards generating high-quality summaries for APIs discussed in
informal documentation. This paper proposes an automatic approach using the
BART algorithm, a state-of-the-art transformer model, to generate summaries for
APIs discussed in StackOverflow. We built an oracle of human-generated
summaries to evaluate our approach against it using ROUGE and BLEU metrics
which are the most widely used evaluation metrics in text summarization.
Furthermore, we evaluated our summaries empirically against a previous work in
terms of quality. Our findings demonstrate that using deep learning algorithms
can improve summaries' quality and outperform the previous work by an average
of %57 for Precision, %66 for Recall, and %61 for F-measure, and it runs 4.4
times faster.
Related papers
- Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning [14.351476383642016]
We propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets.
Code2API does not require additional model training or any manual crafting rules.
It can be easily deployed on personal computers without relying on other external tools.
arXiv Detail & Related papers (2024-05-06T14:22:17Z) - Enhancing API Documentation through BERTopic Modeling and Summarization [0.0]
This paper focuses on the complexities of interpreting Application Programming Interface (API) documentation.
Official API documentation serves as a primary source of information for developers, but it can often be extensive and lacks user-friendliness.
Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation.
arXiv Detail & Related papers (2023-08-17T15:57:12Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval.
We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English.
For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - Incorporating Relevance Feedback for Information-Seeking Retrieval using
Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant.
To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z) - Towards Code Summarization of APIs Based on Unofficial Documentation
Using NLP Techniques [0.0]
In some cases, official documentation is not an efficient way to get the needed information.
We propose an automatic approach to generate summaries for APIs and methods by leveraging unofficial documentation using NLP techniques.
arXiv Detail & Related papers (2022-08-12T15:07:30Z) - Leveraging Unsupervised Learning to Summarize APIs Discussed in Stack
Overflow [1.8047694351309207]
This paper proposes an automatic and novel approach for summarizing Android API methods discussed in Stack Overflow.
Our approach takes the API method's name as an input and generates a natural language summary based on Stack Overflow discussions of that API method.
We have conducted a survey that involves 16 Android developers to evaluate the quality of our automatically generated summaries and compare them with the official Android documentation.
arXiv Detail & Related papers (2021-11-27T18:49:51Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - Holistic Combination of Structural and Textual Code Information for
Context based API Recommendation [28.74546332681778]
We propose a novel API recommendation approach called APIRec-CST (API Recommendation by Combining Structural and Textual code information)
APIRec-CST is a deep learning model that combines the API usage with the text information in source code based on an API Graph Network and a Code Token Network.
We show that our approach achieves a top-5, top-10 accuracy and MRR of 60.3%, 81.5%, 87.7% and 69.4%, and significantly outperforms an existing graph-based statistical approach.
arXiv Detail & Related papers (2020-10-15T04:40:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.