Statement-based Memory for Neural Source Code Summarization
- URL: http://arxiv.org/abs/2307.11709v1
- Date: Fri, 21 Jul 2023 17:04:39 GMT
- Title: Statement-based Memory for Neural Source Code Summarization
- Authors: Aakash Bansal, Siyuan Jiang, Sakib Haque, and Collin McMillan
- Abstract summary: Code summarization underpins software documentation for programmers.
Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques.
We present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation.
- Score: 4.024850952459758
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Source code summarization is the task of writing natural language
descriptions of source code behavior. Code summarization underpins software
documentation for programmers. Short descriptions of code help programmers
understand the program quickly without having to read the code itself. Lately,
neural source code summarization has emerged as the frontier of research into
automated code summarization techniques. By far the most popular targets for
summarization are program subroutines. The idea, in a nutshell, is to train an
encoder-decoder neural architecture using large sets of examples of subroutines
extracted from code repositories. The encoder represents the code and the
decoder represents the summary. However, most current approaches attempt to
treat the subroutine as a single unit. For example, by taking the entire
subroutine as input to a Transformer or RNN-based encoder. But code behavior
tends to depend on the flow from statement to statement. Normally dynamic
analysis may shed light on this flow, but dynamic analysis on hundreds of
thousands of examples in large datasets is not practical. In this paper, we
present a statement-based memory encoder that learns the important elements of
flow during training, leading to a statement-based subroutine representation
without the need for dynamic analysis. We implement our encoder for code
summarization and demonstrate a significant improvement over the
state-of-the-art.
Related papers
- ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization [21.886950861445122]
Code summarization aims to automatically generate succinct natural language summaries for given code snippets.
This paper proposes a novel approach to improve code summarization based on summary-focused tasks.
arXiv Detail & Related papers (2024-07-01T03:06:51Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - Revisiting File Context for Source Code Summarization [2.85386288555414]
A typical use case is generating short summaries of subroutines for use in API documentation.
The heart of almost all current research into code summarization is the encoder-decoder neural architecture.
In this paper, we revisit the idea of file context'' for code summarization.
arXiv Detail & Related papers (2023-09-05T15:44:46Z) - Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures.
We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution.
We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - Soft-Labeled Contrastive Pre-training for Function-level Code
Representation [127.71430696347174]
We present textbfSCodeR, a textbfSoft-labeled contrastive pre-training framework with two positive sample construction methods.
Considering the relevance between codes in a large-scale code corpus, the soft-labeled contrastive pre-training can obtain fine-grained soft-labels.
SCodeR achieves new state-of-the-art performance on four code-related tasks over seven datasets.
arXiv Detail & Related papers (2022-10-18T05:17:37Z) - StructCoder: Structure-Aware Transformer for Code Generation [13.797842927671846]
We introduce a structure-aware Transformer decoder that models both syntax and data flow to enhance the quality of generated code.
The proposed StructCoder model achieves state-of-the-art performance on code translation and text-to-code generation tasks.
arXiv Detail & Related papers (2022-06-10T17:26:31Z) - GypSum: Learning Hybrid Representations for Code Summarization [21.701127410434914]
GypSum is a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model.
We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation.
arXiv Detail & Related papers (2022-04-26T07:44:49Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Project-Level Encoding for Neural Source Code Summarization of
Subroutines [6.939768185086755]
We present a project-level encoder to improve models of code summarization.
We use that representation to augment the encoder of state-of-the-art neural code summarization techniques.
arXiv Detail & Related papers (2021-03-22T06:01:07Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.