Multi-View Graph Representation for Programming Language Processing: An
Investigation into Algorithm Detection
- URL: http://arxiv.org/abs/2202.12481v1
- Date: Fri, 25 Feb 2022 03:35:45 GMT
- Title: Multi-View Graph Representation for Programming Language Processing: An
Investigation into Algorithm Detection
- Authors: Ting Long, Yutong Xie, Xianyu Chen, Weinan Zhang, Qinxiang Cao, Yong
Yu
- Abstract summary: This paper proposes a multi-view graph (MVG) program representation method.
MVG pays more attention to code semantics and simultaneously includes both data flow and control flow as multiple views.
In experiments, MVG outperforms previous methods significantly.
- Score: 35.81014952109471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Program representation, which aims at converting program source code into
vectors with automatically extracted features, is a fundamental problem in
programming language processing (PLP). Recent work tries to represent programs
with neural networks based on source code structures. However, such methods
often focus on the syntax and consider only one single perspective of programs,
limiting the representation power of models. This paper proposes a multi-view
graph (MVG) program representation method. MVG pays more attention to code
semantics and simultaneously includes both data flow and control flow as
multiple views. These views are then combined and processed by a graph neural
network (GNN) to obtain a comprehensive program representation that covers
various aspects. We thoroughly evaluate our proposed MVG approach in the
context of algorithm detection, an important and challenging subfield of PLP.
Specifically, we use a public dataset POJ-104 and also construct a new
challenging dataset ALG-109 to test our method. In experiments, MVG outperforms
previous methods significantly, demonstrating our model's strong capability of
representing source code.
Related papers
- Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision.
We construct a taxonomy and review the most prominent papers in recent years.
We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z) - Deep Graph Reprogramming [112.34663053130073]
"Deep graph reprogramming" is a model reusing task tailored for graph neural networks (GNNs)
We propose an innovative Data Reprogramming paradigm alongside a Model Reprogramming paradigm.
arXiv Detail & Related papers (2023-04-28T02:04:29Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - On the Impact of Multiple Source Code Representations on Software
Engineering Tasks -- An Empirical Study [4.049850026698639]
We modify an AST path-based approach to accept multiple representations as input to an attention-based model.
We evaluate our approach on three tasks: Method Naming, Program Classification, and Clone Detection.
arXiv Detail & Related papers (2021-06-21T08:36:38Z) - Code2Image: Intelligent Code Analysis by Computer Vision Techniques and
Application to Vulnerability Prediction [0.6091702876917281]
We present a novel method to represent source code as image while preserving semantic and syntactic properties.
The method makes it possible to directly enter the resulting image representation of source codes into deep learning (DL) algorithms as input.
We demonstrate feasibility and effectiveness of our method by realizing a vulnerability prediction use case over a public dataset.
arXiv Detail & Related papers (2021-05-07T09:10:20Z) - How to Design Sample and Computationally Efficient VQA Models [53.65668097847456]
We find that representing the text as probabilistic programs and images as object-level scene graphs best satisfy these desiderata.
We extend existing models to leverage these soft programs and scene graphs to train on question answer pairs in an end-to-end manner.
arXiv Detail & Related papers (2021-03-22T01:48:16Z) - Enhancing Handwritten Text Recognition with N-gram sequence
decomposition and Multitask Learning [36.69114677635806]
Current approaches in the field of Handwritten Text Recognition are predominately single task with unigram, character level target units.
In our work, we utilize a Multi-task Learning scheme, training the model to perform decompositions of the target sequence with target units of different granularity.
Our proposed model, even though evaluated only on the unigram task, outperforms its counterpart single-task by absolute 2.52% WER and 1.02% CER.
arXiv Detail & Related papers (2020-12-28T19:35:40Z) - funcGNN: A Graph Neural Network Approach to Program Similarity [0.90238471756546]
FuncGNN is a graph neural network trained on labeled CFG pairs to predict the GED between unseen program pairs by utilizing an effective embedding vector.
This is the first time graph neural networks have been applied on labeled CFGs for estimating the similarity between high-level language programs.
arXiv Detail & Related papers (2020-07-26T23:16:24Z) - ProGraML: Graph-based Deep Learning for Program Optimization and
Analysis [16.520971531754018]
We introduce ProGraML, a graph-based program representation for machine learning.
ProGraML achieves an average 94.0 F1 score, significantly outperforming the state-of-the-art approaches.
We then apply our approach to two high-level tasks - heterogeneous device mapping and program classification - setting new state-of-the-art performance in both.
arXiv Detail & Related papers (2020-03-23T20:27:00Z) - Weakly Supervised Visual Semantic Parsing [49.69377653925448]
Scene Graph Generation (SGG) aims to extract entities, predicates and their semantic structure from images.
Existing SGG methods require millions of manually annotated bounding boxes for training.
We propose Visual Semantic Parsing, VSPNet, and graph-based weakly supervised learning framework.
arXiv Detail & Related papers (2020-01-08T03:46:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.