Retrieve and Refine: Exemplar-based Neural Comment Generation
- URL: http://arxiv.org/abs/2010.04459v1
- Date: Fri, 9 Oct 2020 09:33:10 GMT
- Title: Retrieve and Refine: Exemplar-based Neural Comment Generation
- Authors: Bolin Wei, Yongmin Li, Ge Li, Xin Xia, Zhi Jin
- Abstract summary: Comments of similar code snippets are helpful for comment generation.
We design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input.
We evaluate our approach on a large-scale Java corpus, which contains about 2M samples.
- Score: 27.90756259321855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code comment generation which aims to automatically generate natural language
descriptions for source code, is a crucial task in the field of automatic
software development. Traditional comment generation methods use
manually-crafted templates or information retrieval (IR) techniques to generate
summaries for source code. In recent years, neural network-based methods which
leveraged acclaimed encoder-decoder deep learning framework to learn comment
generation patterns from a large-scale parallel code corpus, have achieved
impressive results. However, these emerging methods only take code-related
information as input. Software reuse is common in the process of software
development, meaning that comments of similar code snippets are helpful for
comment generation. Inspired by the IR-based and template-based approaches, in
this paper, we propose a neural comment generation approach where we use the
existing comments of similar code snippets as exemplars to guide comment
generation. Specifically, given a piece of code, we first use an IR technique
to retrieve a similar code snippet and treat its comment as an exemplar. Then
we design a novel seq2seq neural network that takes the given code, its AST,
its similar code, and its exemplar as input, and leverages the information from
the exemplar to assist in the target comment generation based on the semantic
similarity between the source code and the similar code. We evaluate our
approach on a large-scale Java corpus, which contains about 2M samples, and
experimental results demonstrate that our model outperforms the
state-of-the-art methods by a substantial margin.
Related papers
- Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
Results indicate that MANGO significantly improves the code pass rate based on the strong baselines.
The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z) - Statement-based Memory for Neural Source Code Summarization [4.024850952459758]
Code summarization underpins software documentation for programmers.
Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques.
We present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation.
arXiv Detail & Related papers (2023-07-21T17:04:39Z) - Deep Learning Based Code Generation Methods: Literature Review [30.17038624027751]
This paper focuses on Code Generation task that aims at generating relevant code fragments according to given natural language descriptions.
In this paper, we systematically review the current work on deep learning-based code generation methods.
arXiv Detail & Related papers (2023-03-02T08:25:42Z) - Soft-Labeled Contrastive Pre-training for Function-level Code
Representation [127.71430696347174]
We present textbfSCodeR, a textbfSoft-labeled contrastive pre-training framework with two positive sample construction methods.
Considering the relevance between codes in a large-scale code corpus, the soft-labeled contrastive pre-training can obtain fine-grained soft-labels.
SCodeR achieves new state-of-the-art performance on four code-related tasks over seven datasets.
arXiv Detail & Related papers (2022-10-18T05:17:37Z) - GypSum: Learning Hybrid Representations for Code Summarization [21.701127410434914]
GypSum is a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model.
We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation.
arXiv Detail & Related papers (2022-04-26T07:44:49Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - CodeRetriever: Unimodal and Bimodal Contrastive Learning [128.06072658302165]
We propose the CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train function-level code semantic representations.
For unimodal contrastive learning, we design a semantic-guided method to build positive code pairs based on the documentation and function name.
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.
arXiv Detail & Related papers (2022-01-26T10:54:30Z) - Project-Level Encoding for Neural Source Code Summarization of
Subroutines [6.939768185086755]
We present a project-level encoder to improve models of code summarization.
We use that representation to augment the encoder of state-of-the-art neural code summarization techniques.
arXiv Detail & Related papers (2021-03-22T06:01:07Z) - Incorporating External Knowledge through Pre-training for Natural
Language to Code Generation [97.97049697457425]
Open-domain code generation aims to generate code in a general-purpose programming language from natural language (NL) intents.
We explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation.
Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa.
arXiv Detail & Related papers (2020-04-20T01:45:27Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z) - DeepSumm -- Deep Code Summaries using Neural Transformer Architecture [8.566457170664927]
We employ neural techniques to solve the task of source code summarizing.
With supervised samples of more than 2.1m comments and code, we reduce the training time by more than 50%.
arXiv Detail & Related papers (2020-03-31T22:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.