Related papers: Leveraging Deep Learning for Abstractive Code Summarization of Unofficial Documentation

Leveraging Deep Learning for Abstractive Code Summarization of Unofficial Documentation

URL: http://arxiv.org/abs/2310.15015v4
Date: Sun, 3 Dec 2023 20:11:11 GMT
Title: Leveraging Deep Learning for Abstractive Code Summarization of Unofficial Documentation
Authors: AmirHossein Naghshzan, Latifa Guerrouj, Olga Baysal
Abstract summary: This paper proposes an automatic approach using the BART algorithm to generate summaries for APIs discussed in StackOverflow. We built an oracle of human-generated summaries to evaluate our approach against it using ROUGE and BLEU metrics. Our findings demonstrate that using deep learning algorithms can improve summaries' quality and outperform the previous work by an average of %57 for Precision.
Score: 1.1816942730023887
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Usually, programming languages have official documentation to guide developers with APIs, methods, and classes. However, researchers identified insufficient or inadequate documentation examples and flaws with the API's complex structure as barriers to learning an API. As a result, developers may consult other sources (StackOverflow, GitHub, etc.) to learn more about an API. Recent research studies have shown that unofficial documentation is a valuable source of information for generating code summaries. We, therefore, have been motivated to leverage such a type of documentation along with deep learning techniques towards generating high-quality summaries for APIs discussed in informal documentation. This paper proposes an automatic approach using the BART algorithm, a state-of-the-art transformer model, to generate summaries for APIs discussed in StackOverflow. We built an oracle of human-generated summaries to evaluate our approach against it using ROUGE and BLEU metrics which are the most widely used evaluation metrics in text summarization. Furthermore, we evaluated our summaries empirically against a previous work in terms of quality. Our findings demonstrate that using deep learning algorithms can improve summaries' quality and outperform the previous work by an average of %57 for Precision, %66 for Recall, and %61 for F-measure, and it runs 4.4 times faster.

Related papers

GeAR: Generation Augmented Retrieval [82.20696567697016]
Document retrieval techniques form the foundation for the development of large-scale information systems. The prevailing methodology is to construct a bi-encoder and compute the semantic similarity. We propose a new method called $textbfGe$neration that incorporates well-designed fusion and decoding modules.
arXiv Detail & Related papers (2025-01-06T05:29:00Z)
Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning [14.351476383642016]
We propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require additional model training or any manual crafting rules. It can be easily deployed on personal computers without relying on other external tools.
arXiv Detail & Related papers (2024-05-06T14:22:17Z)
Enhancing API Documentation through BERTopic Modeling and Summarization [0.0]
This paper focuses on the complexities of interpreting Application Programming Interface (API) documentation. Official API documentation serves as a primary source of information for developers, but it can often be extensive and lacks user-friendliness. Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation.
arXiv Detail & Related papers (2023-08-17T15:57:12Z)
Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries. We propose a novel framework that emulates the process of programmers writing private code. We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z)
Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval. We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English. For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code. We conduct a human study to identify the criteria for high-quality explanatory docstring for code. We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z)
Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant. To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z)
Towards Code Summarization of APIs Based on Unofficial Documentation Using NLP Techniques [0.0]
In some cases, official documentation is not an efficient way to get the needed information. We propose an automatic approach to generate summaries for APIs and methods by leveraging unofficial documentation using NLP techniques.
arXiv Detail & Related papers (2022-08-12T15:07:30Z)
Leveraging Unsupervised Learning to Summarize APIs Discussed in Stack Overflow [1.8047694351309207]
This paper proposes an automatic and novel approach for summarizing Android API methods discussed in Stack Overflow. Our approach takes the API method's name as an input and generates a natural language summary based on Stack Overflow discussions of that API method. We have conducted a survey that involves 16 Android developers to evaluate the quality of our automatically generated summaries and compare them with the official Android documentation.
arXiv Detail & Related papers (2021-11-27T18:49:51Z)
Hierarchical Bi-Directional Self-Attention Networks for Paper Review Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation. Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three) We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z)
Holistic Combination of Structural and Textual Code Information for Context based API Recommendation [28.74546332681778]
We propose a novel API recommendation approach called APIRec-CST (API Recommendation by Combining Structural and Textual code information) APIRec-CST is a deep learning model that combines the API usage with the text information in source code based on an API Graph Network and a Code Token Network. We show that our approach achieves a top-5, top-10 accuracy and MRR of 60.3%, 81.5%, 87.7% and 69.4%, and significantly outperforms an existing graph-based statistical approach.
arXiv Detail & Related papers (2020-10-15T04:40:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.