Revisiting File Context for Source Code Summarization
- URL: http://arxiv.org/abs/2309.02326v1
- Date: Tue, 5 Sep 2023 15:44:46 GMT
- Title: Revisiting File Context for Source Code Summarization
- Authors: Aakash Bansal, Chia-Yi Su, and Collin McMillan
- Abstract summary: A typical use case is generating short summaries of subroutines for use in API documentation.
The heart of almost all current research into code summarization is the encoder-decoder neural architecture.
In this paper, we revisit the idea of file context'' for code summarization.
- Score: 2.85386288555414
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Source code summarization is the task of writing natural language
descriptions of source code. A typical use case is generating short summaries
of subroutines for use in API documentation. The heart of almost all current
research into code summarization is the encoder-decoder neural architecture,
and the encoder input is almost always a single subroutine or other short code
snippet. The problem with this setup is that the information needed to describe
the code is often not present in the code itself -- that information often
resides in other nearby code. In this paper, we revisit the idea of ``file
context'' for code summarization. File context is the idea of encoding select
information from other subroutines in the same file. We propose a novel
modification of the Transformer architecture that is purpose-built to encode
file context and demonstrate its improvement over several baselines. We find
that file context helps on a subset of challenging examples where traditional
approaches struggle.
Related papers
- SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - Statement-based Memory for Neural Source Code Summarization [4.024850952459758]
Code summarization underpins software documentation for programmers.
Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques.
We present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation.
arXiv Detail & Related papers (2023-07-21T17:04:39Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - Soft-Labeled Contrastive Pre-training for Function-level Code
Representation [127.71430696347174]
We present textbfSCodeR, a textbfSoft-labeled contrastive pre-training framework with two positive sample construction methods.
Considering the relevance between codes in a large-scale code corpus, the soft-labeled contrastive pre-training can obtain fine-grained soft-labels.
SCodeR achieves new state-of-the-art performance on four code-related tasks over seven datasets.
arXiv Detail & Related papers (2022-10-18T05:17:37Z) - DocCoder: Generating Code by Retrieving and Reading Docs [87.88474546826913]
We introduce DocCoder, an approach that explicitly leverages code manuals and documentation.
Our approach is general, can be applied to any programming language, and is agnostic to the underlying neural model.
arXiv Detail & Related papers (2022-07-13T06:47:51Z) - StructCoder: Structure-Aware Transformer for Code Generation [13.797842927671846]
We introduce a structure-aware Transformer decoder that models both syntax and data flow to enhance the quality of generated code.
The proposed StructCoder model achieves state-of-the-art performance on code translation and text-to-code generation tasks.
arXiv Detail & Related papers (2022-06-10T17:26:31Z) - InCoder: A Generative Model for Code Infilling and Synthesis [88.46061996766348]
We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) and editing (via infilling)
InCoder is trained to generate code files from a large corpus of permissively licensed code.
Our model is the first generative model that is able to directly perform zero-shot code infilling.
arXiv Detail & Related papers (2022-04-12T16:25:26Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Project-Level Encoding for Neural Source Code Summarization of
Subroutines [6.939768185086755]
We present a project-level encoder to improve models of code summarization.
We use that representation to augment the encoder of state-of-the-art neural code summarization techniques.
arXiv Detail & Related papers (2021-03-22T06:01:07Z) - Retrieve and Refine: Exemplar-based Neural Comment Generation [27.90756259321855]
Comments of similar code snippets are helpful for comment generation.
We design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input.
We evaluate our approach on a large-scale Java corpus, which contains about 2M samples.
arXiv Detail & Related papers (2020-10-09T09:33:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.