Improved Code Summarization via a Graph Neural Network
- URL: http://arxiv.org/abs/2004.02843v2
- Date: Tue, 7 Apr 2020 06:29:30 GMT
- Title: Improved Code Summarization via a Graph Neural Network
- Authors: Alexander LeClair, Sakib Haque, Lingfei Wu, Collin McMillan
- Abstract summary: In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
- Score: 96.03715569092523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic source code summarization is the task of generating natural
language descriptions for source code. Automatic code summarization is a
rapidly expanding research area, especially as the community has taken greater
advantage of advances in neural network and AI technologies. In general, source
code summarization techniques use the source code as input and outputs a
natural language description. Yet a strong consensus is developing that using
structural information as input leads to improved performance. The first
approaches to use structural information flattened the AST into a sequence.
Recently, more complex approaches based on random AST paths or graph neural
networks have improved on the models using flattened ASTs. However, the
literature still does not describe the using a graph neural network together
with source code sequence as separate inputs to a model. Therefore, in this
paper, we present an approach that uses a graph-based neural architecture that
better matches the default structure of the AST to generate these summaries. We
evaluate our technique using a data set of 2.1 million Java method-comment
pairs and show improvement over four baseline techniques, two from the software
engineering literature, and two from machine learning literature.
Related papers
- Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs [5.953617559607503]
Vul-LMGNN is a unified model that combines pre-trained code language models with code property graphs.
Vul-LMGNN constructs a code property graph that integrates various code attributes into a unified graph structure.
To effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network.
arXiv Detail & Related papers (2024-04-23T03:48:18Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Boosting Source Code Learning with Data Augmentation: An Empirical Study [16.49710700412084]
We study whether data augmentation methods originally used for text and graphs are effective in improving the training quality of source code learning.
Our results identify the data augmentation methods that can produce more accurate and robust models for source code learning.
arXiv Detail & Related papers (2023-03-13T01:47:05Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Precise Learning of Source Code Contextual Semantics via Hierarchical
Dependence Structure and Graph Attention Networks [28.212889828892664]
We propose a novel source code model embedded with hierarchical dependencies.
We introduce the syntactic structural of the basic block, i.e., its corresponding AST, in source code model to provide sufficient information.
The results show that our model reduces the scale of parameters by 50% and achieves 4% improvement on accuracy on program classification task.
arXiv Detail & Related papers (2021-11-20T04:03:42Z) - GraphFormers: GNN-nested Transformers for Representation Learning on
Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models.
With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow.
In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z) - Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning.
We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations.
We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z) - Learning to map source code to software vulnerability using
code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective.
We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z) - Retrieval-Augmented Generation for Code Summarization via Hybrid GNN [23.445231228940738]
We propose a novel retrieval-augmented mechanism to combine the benefits of both worlds.
To mitigate the limitation of Graph Neural Networks (GNNs) on capturing global graph structure information of source code, we propose a novel attention-based dynamic graph.
Our method achieves the state-of-the-art performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of BLEU-4, ROUGE-L and METEOR.
arXiv Detail & Related papers (2020-06-09T17:09:29Z) - DeepSumm -- Deep Code Summaries using Neural Transformer Architecture [8.566457170664927]
We employ neural techniques to solve the task of source code summarizing.
With supervised samples of more than 2.1m comments and code, we reduce the training time by more than 50%.
arXiv Detail & Related papers (2020-03-31T22:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.