Related papers: A Neural Network Architecture for Program Understanding Inspired by Human Behaviors

A Neural Network Architecture for Program Understanding Inspired by Human Behaviors

URL: http://arxiv.org/abs/2206.04730v1
Date: Tue, 10 May 2022 06:53:45 GMT
Title: A Neural Network Architecture for Program Understanding Inspired by Human Behaviors
Authors: Renyu Zhu, Lei Yuan, Xiang Li, Ming Gao, Wenyuan Cai
Abstract summary: We present a partitioning-based graph neural network model PGNN on the upgraded AST of codes. We transform raw codes with external knowledge and apply pre-training techniques for information extraction. We conduct extensive experiments to show the superior performance of PGNN-EK on the code summarization and code clone detection tasks.
Score: 10.745648153049965
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Program understanding is a fundamental task in program language processing. Despite the success, existing works fail to take human behaviors as reference in understanding programs. In this paper, we consider human behaviors and propose the PGNN-EK model that consists of two main components. On the one hand, inspired by the "divide-and-conquer" reading behaviors of humans, we present a partitioning-based graph neural network model PGNN on the upgraded AST of codes. On the other hand, to characterize human behaviors of resorting to other resources to help code comprehension, we transform raw codes with external knowledge and apply pre-training techniques for information extraction. Finally, we combine the two embeddings generated from the two components to output code embeddings. We conduct extensive experiments to show the superior performance of PGNN-EK on the code summarization and code clone detection tasks. In particular, to show the generalization ability of our model, we release a new dataset that is more challenging for code clone detection and could advance the development of the community. Our codes and data are publicly available at https://github.com/RecklessRonan/PGNN-EK.

Related papers

Codebook Features: Sparse and Discrete Interpretability for Neural Networks [43.06828312515959]
We explore whether we can train neural networks to have hidden states that are sparse, discrete, and more interpretable. Codebook features are produced by finetuning neural networks with vector quantization bottlenecks at each layer. We find that neural networks can operate under this extreme bottleneck with only modest degradation in performance.
arXiv Detail & Related papers (2023-10-26T08:28:48Z)
Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective [53.999128831324576]
Graph neural networks (GNNs) have pioneered advancements in graph representation learning. This study investigates the role of graph convolution within the context of feature learning theory.
arXiv Detail & Related papers (2023-06-24T10:21:11Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills [31.75121546422898]
We present TransCoder, a unified Transferable fine-tuning strategy for Code representation learning. We employ a tunable prefix encoder as the meta-learner to capture cross-task and cross-language transferable knowledge. Our method can lead to superior performance on various code-related tasks and encourage mutual reinforcement.
arXiv Detail & Related papers (2023-05-23T06:59:22Z)
Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks. The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z)
Pointer Value Retrieval: A new benchmark for understanding the limits of neural network generalization [40.21297628440919]
We introduce a novel benchmark, Pointer Value Retrieval (PVR) tasks, that explore the limits of neural network generalization. PVR tasks can consist of visual as well as symbolic inputs, each with varying levels of difficulty. We demonstrate that this task structure provides a rich testbed for understanding generalization.
arXiv Detail & Related papers (2021-07-27T03:50:31Z)
Node2Seq: Towards Trainable Convolutions in Graph Neural Networks [59.378148590027735]
We propose a graph network layer, known as Node2Seq, to learn node embeddings with explicitly trainable weights for different neighboring nodes. For a target node, our method sorts its neighboring nodes via attention mechanism and then employs 1D convolutional neural networks (CNNs) to enable explicit weights for information aggregation. In addition, we propose to incorporate non-local information for feature learning in an adaptive manner based on the attention scores.
arXiv Detail & Related papers (2021-01-06T03:05:37Z)
Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks [55.98291376393561]
Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks. Recurrent neural networks (RNNs) are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure. We introduce a novel GNN architecture, the Instruction Pointer Attention Graph Neural Networks (IPA-GNN), which improves systematic generalization on the task of learning to execute programs.
arXiv Detail & Related papers (2020-10-23T19:12:30Z)
Towards Demystifying Dimensions of Source Code Embeddings [5.211235558099913]
We present our preliminary results towards better understanding the contents of code2vec neural source code embeddings. Our results suggest that the handcrafted features can perform very close to the highly-dimensional code2vec embeddings. We also find that the code2vec embeddings are more resilient to the removal of dimensions with low information gains than the handcrafted features.
arXiv Detail & Related papers (2020-08-29T21:59:11Z)
Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description. We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)
Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning [18.354352985591305]
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Recent studies have combined these two tasks to improve their performance. We propose a novel end-to-end model for the two tasks by introducing an additional code generation task.
arXiv Detail & Related papers (2020-02-24T12:26:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.