Related papers: An Extractive-and-Abstractive Framework for Source Code Summarization

An Extractive-and-Abstractive Framework for Source Code Summarization

URL: http://arxiv.org/abs/2206.07245v2
Date: Sat, 4 Nov 2023 07:43:38 GMT
Title: An Extractive-and-Abstractive Framework for Source Code Summarization
Authors: Weisong Sun and Chunrong Fang and Yuchen Chen and Quanjun Zhang and Guanhong Tao and Tingxu Han and Yifei Ge and Yudu You and Bin Luo
Abstract summary: Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. We propose a novel extractive-and-abstractive framework to generate human-written-like summaries with preserved factual details.
Score: 28.553366270065656
License: http://creativecommons.org/licenses/by/4.0/
Abstract: (Source) Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization techniques can be categorized into extractive methods and abstractive methods. The extractive methods extract a subset of important statements and keywords from the code snippet using retrieval techniques, and generate a summary that preserves factual details in important statements and keywords. However, such a subset may miss identifier or entity naming, and consequently, the naturalness of generated summary is usually poor. The abstractive methods can generate human-written-like summaries leveraging encoder-decoder models from the neural machine translation domain. The generated summaries however often miss important factual details. To generate human-written-like summaries with preserved factual details, we propose a novel extractive-and-abstractive framework. The extractive module in the framework performs a task of extractive code summarization, which takes in the code snippet and predicts important statements containing key factual details. The abstractive module in the framework performs a task of abstractive code summarization, which takes in the entire code snippet and important statements in parallel and generates a succinct and human-written-like natural language summary. We evaluate the effectiveness of our technique, called EACS, by conducting extensive experiments on three datasets involving six programming languages. Experimental results show that EACS significantly outperforms state-of-the-art techniques in terms of all three widely used metrics, including BLEU, METEOR, and ROUGH-L.

Related papers

DPS: Design Pattern Summarisation Using Code Features [8.24515384844758]
We generate summaries for software design patterns using Java and NLG libraries. Our summaries closely align with human-written summaries. A follow-up survey shows that DPS summaries were rated as capturing context better than human-generated summaries.
arXiv Detail & Related papers (2025-04-15T11:27:44Z)
Consistency Evaluation of News Article Summaries Generated by Large (and Small) Language Models [0.0]
Large Language Models (LLMs) have shown promise in generating fluent abstractive summaries but they can produce hallucinated details not grounded in the source text. This paper embarks on an exploration of text summarization with a diverse set of techniques, including TextRank, BART, Mistral-7B-Instruct, and OpenAI GPT-3.5-Turbo. We find that all summarization models produce consistent summaries when tested on the XL-Sum dataset.
arXiv Detail & Related papers (2025-02-28T01:58:17Z)
Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback. Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z)
ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization [21.886950861445122]
Code summarization aims to automatically generate succinct natural language summaries for given code snippets. This paper proposes a novel approach to improve code summarization based on summary-focused tasks.
arXiv Detail & Related papers (2024-07-01T03:06:51Z)
Source Identification in Abstractive Summarization [0.8883733362171033]
We define input sentences that contain essential information in the generated summary as $textitsource sentences$ and study how abstractive summaries are made by analyzing the source sentences. We formulate automatic source sentence detection and compare multiple methods to establish a strong baseline for the task. Experimental results show that the perplexity-based method performs well in highly abstractive settings, while similarity-based methods robustly in relatively extractive settings.
arXiv Detail & Related papers (2024-02-07T09:09:09Z)
EditSum: A Retrieve-and-Edit Framework for Source Code Summarization [46.84628094508991]
Existing studies show that code summaries help developers understand and maintain source code. Code summarization aims to generate natural language descriptions automatically for source code. This paper proposes a novel retrieve-and-edit approach named EditSum for code summarization.
arXiv Detail & Related papers (2023-08-26T05:48:57Z)
Attributable and Scalable Opinion Summarization [79.87892048285819]
We generate abstractive summaries by decoding frequent encodings, and extractive summaries by selecting the sentences assigned to the same frequent encodings. Our method is attributable, because the model identifies sentences used to generate the summary as part of the summarization process. It scales easily to many hundreds of input reviews, because aggregation is performed in the latent space rather than over long sequences of tokens.
arXiv Detail & Related papers (2023-05-19T11:30:37Z)
Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models. To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z)
Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees [89.60269205320431]
Current abstractive summarization models either suffer from a lack of clear interpretability or provide incomplete rationales. We propose the Summarization Program (SP), an interpretable modular framework consisting of an (ordered) list of binary trees. A Summarization Program contains one root node per summary sentence, and a distinct tree connects each summary sentence to the document sentences.
arXiv Detail & Related papers (2022-09-21T16:50:22Z)
A Survey on Neural Abstractive Summarization Methods and Factual Consistency of Summarization [18.763290930749235]
summarization is the process of shortening a set of textual data computationally, to create a subset (a summary) Existing summarization methods can be roughly divided into two types: extractive and abstractive. An extractive summarizer explicitly selects text snippets from the source document, while an abstractive summarizer generates novel text snippets to convey the most salient concepts prevalent in the source.
arXiv Detail & Related papers (2022-04-20T14:56:36Z)
Exploiting Method Names to Improve Code Summarization: A Deliberation Multi-Task Learning Approach [5.577102440028882]
We design a novel multi-task learning (MTL) approach for code summarization. We first introduce the tasks of generation and informativeness prediction of method names. A novel two-pass deliberation mechanism is then incorporated into our MTL architecture to generate more consistent intermediate states.
arXiv Detail & Related papers (2021-03-21T17:52:21Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.