A Transformer-based Approach for Source Code Summarization
- URL: http://arxiv.org/abs/2005.00653v1
- Date: Fri, 1 May 2020 23:29:36 GMT
- Title: A Transformer-based Approach for Source Code Summarization
- Authors: Wasi Uddin Ahmad and Saikat Chakraborty and Baishakhi Ray and Kai-Wei
Chang
- Abstract summary: We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
- Score: 86.08359401867577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating a readable summary that describes the functionality of a program
is known as source code summarization. In this task, learning code
representation by modeling the pairwise relationship between code tokens to
capture their long-range dependencies is crucial. To learn code representation
for summarization, we explore the Transformer model that uses a self-attention
mechanism and has shown to be effective in capturing long-range dependencies.
In this work, we show that despite the approach is simple, it outperforms the
state-of-the-art techniques by a significant margin. We perform extensive
analysis and ablation studies that reveal several important findings, e.g., the
absolute encoding of source code tokens' position hinders, while relative
encoding significantly improves the summarization performance. We have made our
code publicly available to facilitate future research.
Related papers
- Enhancing Source Code Representations for Deep Learning with Static
Analysis [10.222207222039048]
This paper explores the integration of static analysis and additional context such as bug reports and design patterns into source code representations for deep learning models.
We use the Abstract Syntax Tree-based Neural Network (ASTNN) method and augment it with additional context information obtained from bug reports and design patterns.
Our approach improves the representation and processing of source code, thereby improving task performance.
arXiv Detail & Related papers (2024-02-14T20:17:04Z) - Encoding Version History Context for Better Code Representation [13.045078976464307]
This paper presents preliminary evidence of the potential benefit of encoding contextual information from the version history to predict code clones and perform code classification.
To ensure the technique performs consistently, we need to conduct a holistic investigation on a larger code base using different combinations of contexts, aggregation, and models.
arXiv Detail & Related papers (2024-02-06T07:35:36Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - Understanding Code Semantics: An Evaluation of Transformer Models in
Summarization [0.0]
We evaluate the efficacy of code summarization by altering function and variable names.
We introduce adversaries like dead code and commented code across three programming languages.
arXiv Detail & Related papers (2023-10-25T02:41:50Z) - Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models.
To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z) - Exploring Representation-Level Augmentation for Code Search [50.94201167562845]
We explore augmentation methods that augment data (both code and query) at representation level which does not require additional data processing and training.
We experimentally evaluate the proposed representation-level augmentation methods with state-of-the-art code search models on a large-scale public dataset.
arXiv Detail & Related papers (2022-10-21T22:47:37Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.