Related papers: EditSum: A Retrieve-and-Edit Framework for Source Code Summarization

EditSum: A Retrieve-and-Edit Framework for Source Code Summarization

URL: http://arxiv.org/abs/2308.13775v2
Date: Thu, 7 Sep 2023 11:19:30 GMT
Title: EditSum: A Retrieve-and-Edit Framework for Source Code Summarization
Authors: Jia Li, Yongmin Li, Ge Li, Xing Hu, Xin Xia, Zhi Jin
Abstract summary: Existing studies show that code summaries help developers understand and maintain source code. Code summarization aims to generate natural language descriptions automatically for source code. This paper proposes a novel retrieve-and-edit approach named EditSum for code summarization.
Score: 46.84628094508991
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing studies show that code summaries help developers understand and maintain source code. Unfortunately, these summaries are often missing or outdated in software projects. Code summarization aims to generate natural language descriptions automatically for source code. Code summaries are highly structured and have repetitive patterns. Besides the patternized words, a code summary also contains important keywords, which are the key to reflecting the functionality of the code. However, the state-of-the-art approaches perform poorly on predicting the keywords, which leads to the generated summaries suffering a loss in informativeness. To alleviate this problem, this paper proposes a novel retrieve-and-edit approach named EditSum for code summarization. Specifically, EditSum first retrieves a similar code snippet from a pre-defined corpus and treats its summary as a prototype summary to learn the pattern. Then, EditSum edits the prototype automatically to combine the pattern in the prototype with the semantic information of input code. Our motivation is that the retrieved prototype provides a good start-point for post-generation because the summaries of similar code snippets often have the same pattern. The post-editing process further reuses the patternized words in the prototype and generates keywords based on the semantic information of input code. We conduct experiments on a large-scale Java corpus and experimental results demonstrate that EditSum outperforms the state-of-the-art approaches by a substantial margin. The human evaluation also proves the summaries generated by EditSum are more informative and useful. We also verify that EditSum performs well on predicting the patternized words and keywords.

Related papers

DPS: Design Pattern Summarisation Using Code Features [8.24515384844758]
We generate summaries for software design patterns using Java and NLG libraries. Our summaries closely align with human-written summaries. A follow-up survey shows that DPS summaries were rated as capturing context better than human-generated summaries.
arXiv Detail & Related papers (2025-04-15T11:27:44Z)
ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization [21.886950861445122]
Code summarization aims to automatically generate succinct natural language summaries for given code snippets. This paper proposes a novel approach to improve code summarization based on summary-focused tasks.
arXiv Detail & Related papers (2024-07-01T03:06:51Z)
SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z)
AST-MHSA : Code Summarization using Multi-Head Self-Attention [1.588193964339148]
We present a model, AST-MHSA, that uses multi-head attention to extract semantic information from the abstract syntax tree (AST) of the code. The model is trained on a dataset of code and summaries, and the parameters are optimized to minimize the loss between the generated summaries and the ground-truth summaries.
arXiv Detail & Related papers (2023-08-10T15:43:46Z)
Attributable and Scalable Opinion Summarization [79.87892048285819]
We generate abstractive summaries by decoding frequent encodings, and extractive summaries by selecting the sentences assigned to the same frequent encodings. Our method is attributable, because the model identifies sentences used to generate the summary as part of the summarization process. It scales easily to many hundreds of input reviews, because aggregation is performed in the latent space rather than over long sequences of tokens.
arXiv Detail & Related papers (2023-05-19T11:30:37Z)
Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively. A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z)
Soft-Labeled Contrastive Pre-training for Function-level Code Representation [127.71430696347174]
We present textbfSCodeR, a textbfSoft-labeled contrastive pre-training framework with two positive sample construction methods. Considering the relevance between codes in a large-scale code corpus, the soft-labeled contrastive pre-training can obtain fine-grained soft-labels. SCodeR achieves new state-of-the-art performance on four code-related tasks over seven datasets.
arXiv Detail & Related papers (2022-10-18T05:17:37Z)
An Extractive-and-Abstractive Framework for Source Code Summarization [28.553366270065656]
Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. We propose a novel extractive-and-abstractive framework to generate human-written-like summaries with preserved factual details.
arXiv Detail & Related papers (2022-06-15T02:14:24Z)
Retrieve and Refine: Exemplar-based Neural Comment Generation [27.90756259321855]
Comments of similar code snippets are helpful for comment generation. We design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input. We evaluate our approach on a large-scale Java corpus, which contains about 2M samples.
arXiv Detail & Related papers (2020-10-09T09:33:10Z)
Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations [28.61567319928316]
Corder is a self-supervised contrastive learning framework for source code model. Key innovation is that we train the source code model by asking it to recognize similar and dissimilar code snippets. We have shown that the code models pretrained by Corder substantially outperform the other baselines for code-to-code retrieval, text-to-code retrieval, and code-to-text summarization tasks.
arXiv Detail & Related papers (2020-09-06T13:31:16Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.