A Neural Network Architecture for Program Understanding Inspired by
Human Behaviors
- URL: http://arxiv.org/abs/2206.04730v1
- Date: Tue, 10 May 2022 06:53:45 GMT
- Title: A Neural Network Architecture for Program Understanding Inspired by
Human Behaviors
- Authors: Renyu Zhu, Lei Yuan, Xiang Li, Ming Gao, Wenyuan Cai
- Abstract summary: We present a partitioning-based graph neural network model PGNN on the upgraded AST of codes.
We transform raw codes with external knowledge and apply pre-training techniques for information extraction.
We conduct extensive experiments to show the superior performance of PGNN-EK on the code summarization and code clone detection tasks.
- Score: 10.745648153049965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Program understanding is a fundamental task in program language processing.
Despite the success, existing works fail to take human behaviors as reference
in understanding programs. In this paper, we consider human behaviors and
propose the PGNN-EK model that consists of two main components. On the one
hand, inspired by the "divide-and-conquer" reading behaviors of humans, we
present a partitioning-based graph neural network model PGNN on the upgraded
AST of codes. On the other hand, to characterize human behaviors of resorting
to other resources to help code comprehension, we transform raw codes with
external knowledge and apply pre-training techniques for information
extraction. Finally, we combine the two embeddings generated from the two
components to output code embeddings. We conduct extensive experiments to show
the superior performance of PGNN-EK on the code summarization and code clone
detection tasks. In particular, to show the generalization ability of our
model, we release a new dataset that is more challenging for code clone
detection and could advance the development of the community. Our codes and
data are publicly available at https://github.com/RecklessRonan/PGNN-EK.
Related papers
- Codebook Features: Sparse and Discrete Interpretability for Neural
Networks [43.06828312515959]
We explore whether we can train neural networks to have hidden states that are sparse, discrete, and more interpretable.
Codebook features are produced by finetuning neural networks with vector quantization bottlenecks at each layer.
We find that neural networks can operate under this extreme bottleneck with only modest degradation in performance.
arXiv Detail & Related papers (2023-10-26T08:28:48Z) - Graph Neural Networks Provably Benefit from Structural Information: A
Feature Learning Perspective [53.999128831324576]
Graph neural networks (GNNs) have pioneered advancements in graph representation learning.
This study investigates the role of graph convolution within the context of feature learning theory.
arXiv Detail & Related papers (2023-06-24T10:21:11Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills [31.75121546422898]
We present TransCoder, a unified Transferable fine-tuning strategy for Code representation learning.
We employ a tunable prefix encoder as the meta-learner to capture cross-task and cross-language transferable knowledge.
Our method can lead to superior performance on various code-related tasks and encourage mutual reinforcement.
arXiv Detail & Related papers (2023-05-23T06:59:22Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Pointer Value Retrieval: A new benchmark for understanding the limits of
neural network generalization [40.21297628440919]
We introduce a novel benchmark, Pointer Value Retrieval (PVR) tasks, that explore the limits of neural network generalization.
PVR tasks can consist of visual as well as symbolic inputs, each with varying levels of difficulty.
We demonstrate that this task structure provides a rich testbed for understanding generalization.
arXiv Detail & Related papers (2021-07-27T03:50:31Z) - Node2Seq: Towards Trainable Convolutions in Graph Neural Networks [59.378148590027735]
We propose a graph network layer, known as Node2Seq, to learn node embeddings with explicitly trainable weights for different neighboring nodes.
For a target node, our method sorts its neighboring nodes via attention mechanism and then employs 1D convolutional neural networks (CNNs) to enable explicit weights for information aggregation.
In addition, we propose to incorporate non-local information for feature learning in an adaptive manner based on the attention scores.
arXiv Detail & Related papers (2021-01-06T03:05:37Z) - Learning to Execute Programs with Instruction Pointer Attention Graph
Neural Networks [55.98291376393561]
Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks.
Recurrent neural networks (RNNs) are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure.
We introduce a novel GNN architecture, the Instruction Pointer Attention Graph Neural Networks (IPA-GNN), which improves systematic generalization on the task of learning to execute programs.
arXiv Detail & Related papers (2020-10-23T19:12:30Z) - Towards Demystifying Dimensions of Source Code Embeddings [5.211235558099913]
We present our preliminary results towards better understanding the contents of code2vec neural source code embeddings.
Our results suggest that the handcrafted features can perform very close to the highly-dimensional code2vec embeddings.
We also find that the code2vec embeddings are more resilient to the removal of dimensions with low information gains than the handcrafted features.
arXiv Detail & Related papers (2020-08-29T21:59:11Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z) - Leveraging Code Generation to Improve Code Retrieval and Summarization
via Dual Learning [18.354352985591305]
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query.
Recent studies have combined these two tasks to improve their performance.
We propose a novel end-to-end model for the two tasks by introducing an additional code generation task.
arXiv Detail & Related papers (2020-02-24T12:26:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.