Related papers: Improved Code Summarization via a Graph Neural Network

Improved Code Summarization via a Graph Neural Network

URL: http://arxiv.org/abs/2004.02843v2
Date: Tue, 7 Apr 2020 06:29:30 GMT
Title: Improved Code Summarization via a Graph Neural Network
Authors: Alexander LeClair, Sakib Haque, Lingfei Wu, Collin McMillan
Abstract summary: In general, source code summarization techniques use the source code as input and outputs a natural language description. We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
Score: 96.03715569092523
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic source code summarization is the task of generating natural language descriptions for source code. Automatic code summarization is a rapidly expanding research area, especially as the community has taken greater advantage of advances in neural network and AI technologies. In general, source code summarization techniques use the source code as input and outputs a natural language description. Yet a strong consensus is developing that using structural information as input leads to improved performance. The first approaches to use structural information flattened the AST into a sequence. Recently, more complex approaches based on random AST paths or graph neural networks have improved on the models using flattened ASTs. However, the literature still does not describe the using a graph neural network together with source code sequence as separate inputs to a model. Therefore, in this paper, we present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries. We evaluate our technique using a data set of 2.1 million Java method-comment pairs and show improvement over four baseline techniques, two from the software engineering literature, and two from machine learning literature.

Related papers

Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs [5.953617559607503]
Vul-LMGNN is a unified model that combines pre-trained code language models with code property graphs. Vul-LMGNN constructs a code property graph that integrates various code attributes into a unified graph structure. To effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network.
arXiv Detail & Related papers (2024-04-23T03:48:18Z)
Sparse Attention-Based Neural Networks for Code Classification [15.296053323327312]
We introduce an approach named the Sparse Attention-based neural network for Code Classification (SACC) In the first step, source code undergoes syntax parsing and preprocessing. The encoded sequences of subtrees are fed into a Transformer model that incorporates sparse attention mechanisms for the purpose of classification.
arXiv Detail & Related papers (2023-11-11T14:07:12Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification [9.01892294402701]
We propose to represent AST as a heterogeneous directed hypergraph (HDHG) and process the graph by heterogeneous directed hypergraph neural network (HDHGN) for code classification. Our method improves code understanding and can represent high-order data correlations beyond paired interactions.
arXiv Detail & Related papers (2023-05-07T09:28:16Z)
Boosting Source Code Learning with Data Augmentation: An Empirical Study [16.49710700412084]
We study whether data augmentation methods originally used for text and graphs are effective in improving the training quality of source code learning. Our results identify the data augmentation methods that can produce more accurate and robust models for source code learning.
arXiv Detail & Related papers (2023-03-13T01:47:05Z)
Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks. The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z)
Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks [28.212889828892664]
We propose a novel source code model embedded with hierarchical dependencies. We introduce the syntactic structural of the basic block, i.e., its corresponding AST, in source code model to provide sufficient information. The results show that our model reduces the scale of parameters by 50% and achieves 4% improvement on accuracy on program classification task.
arXiv Detail & Related papers (2021-11-20T04:03:42Z)
GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models. With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow. In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z)
Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning. We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z)
Learning to map source code to software vulnerability using code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z)
Retrieval-Augmented Generation for Code Summarization via Hybrid GNN [23.445231228940738]
We propose a novel retrieval-augmented mechanism to combine the benefits of both worlds. To mitigate the limitation of Graph Neural Networks (GNNs) on capturing global graph structure information of source code, we propose a novel attention-based dynamic graph. Our method achieves the state-of-the-art performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of BLEU-4, ROUGE-L and METEOR.
arXiv Detail & Related papers (2020-06-09T17:09:29Z)
DeepSumm -- Deep Code Summaries using Neural Transformer Architecture [8.566457170664927]
We employ neural techniques to solve the task of source code summarizing. With supervised samples of more than 2.1m comments and code, we reduce the training time by more than 50%.
arXiv Detail & Related papers (2020-03-31T22:43:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.