An Extractive-and-Abstractive Framework for Source Code Summarization
- URL: http://arxiv.org/abs/2206.07245v2
- Date: Sat, 4 Nov 2023 07:43:38 GMT
- Title: An Extractive-and-Abstractive Framework for Source Code Summarization
- Authors: Weisong Sun and Chunrong Fang and Yuchen Chen and Quanjun Zhang and
Guanhong Tao and Tingxu Han and Yifei Ge and Yudu You and Bin Luo
- Abstract summary: Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language.
We propose a novel extractive-and-abstractive framework to generate human-written-like summaries with preserved factual details.
- Score: 28.553366270065656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: (Source) Code summarization aims to automatically generate summaries/comments
for a given code snippet in the form of natural language. Such summaries play a
key role in helping developers understand and maintain source code. Existing
code summarization techniques can be categorized into extractive methods and
abstractive methods. The extractive methods extract a subset of important
statements and keywords from the code snippet using retrieval techniques, and
generate a summary that preserves factual details in important statements and
keywords. However, such a subset may miss identifier or entity naming, and
consequently, the naturalness of generated summary is usually poor. The
abstractive methods can generate human-written-like summaries leveraging
encoder-decoder models from the neural machine translation domain. The
generated summaries however often miss important factual details.
To generate human-written-like summaries with preserved factual details, we
propose a novel extractive-and-abstractive framework. The extractive module in
the framework performs a task of extractive code summarization, which takes in
the code snippet and predicts important statements containing key factual
details. The abstractive module in the framework performs a task of abstractive
code summarization, which takes in the entire code snippet and important
statements in parallel and generates a succinct and human-written-like natural
language summary. We evaluate the effectiveness of our technique, called EACS,
by conducting extensive experiments on three datasets involving six programming
languages. Experimental results show that EACS significantly outperforms
state-of-the-art techniques in terms of all three widely used metrics,
including BLEU, METEOR, and ROUGH-L.
Related papers
- Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization [21.886950861445122]
Code summarization aims to automatically generate succinct natural language summaries for given code snippets.
This paper proposes a novel approach to improve code summarization based on summary-focused tasks.
arXiv Detail & Related papers (2024-07-01T03:06:51Z) - Source Identification in Abstractive Summarization [0.8883733362171033]
We define input sentences that contain essential information in the generated summary as $textitsource sentences$ and study how abstractive summaries are made by analyzing the source sentences.
We formulate automatic source sentence detection and compare multiple methods to establish a strong baseline for the task.
Experimental results show that the perplexity-based method performs well in highly abstractive settings, while similarity-based methods robustly in relatively extractive settings.
arXiv Detail & Related papers (2024-02-07T09:09:09Z) - EditSum: A Retrieve-and-Edit Framework for Source Code Summarization [46.84628094508991]
Existing studies show that code summaries help developers understand and maintain source code.
Code summarization aims to generate natural language descriptions automatically for source code.
This paper proposes a novel retrieve-and-edit approach named EditSum for code summarization.
arXiv Detail & Related papers (2023-08-26T05:48:57Z) - Attributable and Scalable Opinion Summarization [79.87892048285819]
We generate abstractive summaries by decoding frequent encodings, and extractive summaries by selecting the sentences assigned to the same frequent encodings.
Our method is attributable, because the model identifies sentences used to generate the summary as part of the summarization process.
It scales easily to many hundreds of input reviews, because aggregation is performed in the latent space rather than over long sequences of tokens.
arXiv Detail & Related papers (2023-05-19T11:30:37Z) - Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models.
To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z) - Summarization Programs: Interpretable Abstractive Summarization with
Neural Modular Trees [89.60269205320431]
Current abstractive summarization models either suffer from a lack of clear interpretability or provide incomplete rationales.
We propose the Summarization Program (SP), an interpretable modular framework consisting of an (ordered) list of binary trees.
A Summarization Program contains one root node per summary sentence, and a distinct tree connects each summary sentence to the document sentences.
arXiv Detail & Related papers (2022-09-21T16:50:22Z) - A Survey on Neural Abstractive Summarization Methods and Factual
Consistency of Summarization [18.763290930749235]
summarization is the process of shortening a set of textual data computationally, to create a subset (a summary)
Existing summarization methods can be roughly divided into two types: extractive and abstractive.
An extractive summarizer explicitly selects text snippets from the source document, while an abstractive summarizer generates novel text snippets to convey the most salient concepts prevalent in the source.
arXiv Detail & Related papers (2022-04-20T14:56:36Z) - Exploiting Method Names to Improve Code Summarization: A Deliberation
Multi-Task Learning Approach [5.577102440028882]
We design a novel multi-task learning (MTL) approach for code summarization.
We first introduce the tasks of generation and informativeness prediction of method names.
A novel two-pass deliberation mechanism is then incorporated into our MTL architecture to generate more consistent intermediate states.
arXiv Detail & Related papers (2021-03-21T17:52:21Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.