DeepSumm -- Deep Code Summaries using Neural Transformer Architecture
- URL: http://arxiv.org/abs/2004.00998v1
- Date: Tue, 31 Mar 2020 22:43:29 GMT
- Title: DeepSumm -- Deep Code Summaries using Neural Transformer Architecture
- Authors: Vivek Gupta
- Abstract summary: We employ neural techniques to solve the task of source code summarizing.
With supervised samples of more than 2.1m comments and code, we reduce the training time by more than 50%.
- Score: 8.566457170664927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Source code summarizing is a task of writing short, natural language
descriptions of source code behavior during run time. Such summaries are
extremely useful for software development and maintenance but are expensive to
manually author,hence it is done for small fraction of the code that is
produced and is often ignored. Automatic code documentation can possibly solve
this at a low cost. This is thus an emerging research field with further
applications to program comprehension, and software maintenance. Traditional
methods often relied on cognitive models that were built in the form of
templates and by heuristics and had varying degree of adoption by the developer
community. But with recent advancements, end to end data-driven approaches
based on neural techniques have largely overtaken the traditional techniques.
Much of the current landscape employs neural translation based architectures
with recurrence and attention which is resource and time intensive training
procedure. In this paper, we employ neural techniques to solve the task of
source code summarizing and specifically compare NMT based techniques to more
simplified and appealing Transformer architecture on a dataset of Java methods
and comments. We bring forth an argument to dispense the need of recurrence in
the training procedure. To the best of our knowledge, transformer based models
have not been used for the task before. With supervised samples of more than
2.1m comments and code, we reduce the training time by more than 50% and
achieve the BLEU score of 17.99 for the test set of examples.
Related papers
- Zero-Shot Code Representation Learning via Prompt Tuning [6.40875582886359]
We propose Zecoler, a zero-shot approach for learning code representations.
Zecoler is built upon a pre-trained programming language model.
We evaluate Zecoler in five code intelligence tasks including code clone detection, code search, method name prediction, code summarization, and code generation.
arXiv Detail & Related papers (2024-04-13T09:47:07Z) - TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills [31.75121546422898]
We present TransCoder, a unified Transferable fine-tuning strategy for Code representation learning.
We employ a tunable prefix encoder as the meta-learner to capture cross-task and cross-language transferable knowledge.
Our method can lead to superior performance on various code-related tasks and encourage mutual reinforcement.
arXiv Detail & Related papers (2023-05-23T06:59:22Z) - Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models.
To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z) - Using Document Similarity Methods to create Parallel Datasets for Code
Translation [60.36392618065203]
Translating source code from one programming language to another is a critical, time-consuming task.
We propose to use document similarity methods to create noisy parallel datasets of code.
We show that these models perform comparably to models trained on ground truth for reasonable levels of noise.
arXiv Detail & Related papers (2021-10-11T17:07:58Z) - Project-Level Encoding for Neural Source Code Summarization of
Subroutines [6.939768185086755]
We present a project-level encoder to improve models of code summarization.
We use that representation to augment the encoder of state-of-the-art neural code summarization techniques.
arXiv Detail & Related papers (2021-03-22T06:01:07Z) - Retrieve and Refine: Exemplar-based Neural Comment Generation [27.90756259321855]
Comments of similar code snippets are helpful for comment generation.
We design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input.
We evaluate our approach on a large-scale Java corpus, which contains about 2M samples.
arXiv Detail & Related papers (2020-10-09T09:33:10Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.