Learning Program Semantics with Code Representations: An Empirical Study
- URL: http://arxiv.org/abs/2203.11790v1
- Date: Tue, 22 Mar 2022 14:51:44 GMT
- Title: Learning Program Semantics with Code Representations: An Empirical Study
- Authors: Jing Kai Siow and Shangqing Liu and Xiaofei Xie, Guozhu Meng, Yang Liu
- Abstract summary: Program semantics learning is the core and fundamental for various code intelligent tasks.
We categorize current mainstream code representation techniques into four categories.
We evaluate its performance on three diverse and popular code intelligent tasks.
- Score: 22.953964699210296
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Program semantics learning is the core and fundamental for various code
intelligent tasks e.g., vulnerability detection, clone detection. A
considerable amount of existing works propose diverse approaches to learn the
program semantics for different tasks and these works have achieved
state-of-the-art performance. However, currently, a comprehensive and
systematic study on evaluating different program representation techniques
across diverse tasks is still missed.
From this starting point, in this paper, we conduct an empirical study to
evaluate different program representation techniques. Specifically, we
categorize current mainstream code representation techniques into four
categories i.e., Feature-based, Sequence-based, Tree-based, and Graph-based
program representation technique and evaluate its performance on three diverse
and popular code intelligent tasks i.e., {Code Classification}, Vulnerability
Detection, and Clone Detection on the public released benchmark. We further
design three {research questions (RQs)} and conduct a comprehensive analysis to
investigate the performance. By the extensive experimental results, we conclude
that (1) The graph-based representation is superior to the other selected
techniques across these tasks. (2) Compared with the node type information used
in tree-based and graph-based representations, the node textual information is
more critical to learning the program semantics. (3) Different tasks require
the task-specific semantics to achieve their highest performance, however
combining various program semantics from different dimensions such as control
dependency, data dependency can still produce promising results.
Related papers
- Blind Image Quality Assessment via Vision-Language Correspondence: A
Multitask Learning Perspective [93.56647950778357]
Blind image quality assessment (BIQA) predicts the human perception of image quality without any reference information.
We develop a general and automated multitask learning scheme for BIQA to exploit auxiliary knowledge from other tasks.
arXiv Detail & Related papers (2023-03-27T07:58:09Z) - Benchmarking Node Outlier Detection on Graphs [90.29966986023403]
Graph outlier detection is an emerging but crucial machine learning task with numerous applications.
We present the first comprehensive unsupervised node outlier detection benchmark for graphs called UNOD.
arXiv Detail & Related papers (2022-06-21T01:46:38Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z) - A Comprehensive Analytical Survey on Unsupervised and Semi-Supervised
Graph Representation Learning Methods [4.486285347896372]
This survey aims to evaluate all major classes of graph embedding methods.
We organized graph embedding techniques using a taxonomy that includes methods from manual feature engineering, matrix factorization, shallow neural networks, and deep graph convolutional networks.
We designed experiments on top of PyTorch Geometric and DGL libraries and run experiments on different multicore CPU and GPU platforms.
arXiv Detail & Related papers (2021-12-20T07:50:26Z) - On the Impact of Multiple Source Code Representations on Software
Engineering Tasks -- An Empirical Study [4.049850026698639]
We modify an AST path-based approach to accept multiple representations as input to an attention-based model.
We evaluate our approach on three tasks: Method Naming, Program Classification, and Clone Detection.
arXiv Detail & Related papers (2021-06-21T08:36:38Z) - Mining Program Properties From Neural Networks Trained on Source Code
Embeddings [0.0]
We propose a novel approach for mining different program features by analysing the internal behaviour of a deep neural network trained on source code.
We train an autoencoder for each program embedding and then we test the emerging ability of the internal neurons in autonomously building internal representations for different program features.
arXiv Detail & Related papers (2021-03-09T14:25:16Z) - Comparative Code Structure Analysis using Deep Learning for Performance
Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure.
Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z) - BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration [72.88493072196094]
We present a new synthesis approach that leverages learning to guide a bottom-up search over programs.
In particular, we train a model to prioritize compositions of intermediate values during search conditioned on a set of input-output examples.
We show that the combination of learning and bottom-up search is remarkably effective, even with simple supervised learning approaches.
arXiv Detail & Related papers (2020-07-28T17:46:18Z) - Learning Differentiable Programs with Admissible Neural Heuristics [43.54820901841979]
We study the problem of learning differentiable functions expressed as programs in a domain-specific language.
We frame this optimization problem as a search in a weighted graph whose paths encode top-down derivations of program syntax.
Our key innovation is to view various classes of neural networks as continuous relaxations over the space of programs.
arXiv Detail & Related papers (2020-07-23T16:07:39Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.