Related papers: Learning Program Semantics with Code Representations: An Empirical Study

Learning Program Semantics with Code Representations: An Empirical Study

URL: http://arxiv.org/abs/2203.11790v1
Date: Tue, 22 Mar 2022 14:51:44 GMT
Title: Learning Program Semantics with Code Representations: An Empirical Study
Authors: Jing Kai Siow and Shangqing Liu and Xiaofei Xie, Guozhu Meng, Yang Liu
Abstract summary: Program semantics learning is the core and fundamental for various code intelligent tasks. We categorize current mainstream code representation techniques into four categories. We evaluate its performance on three diverse and popular code intelligent tasks.
Score: 22.953964699210296
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for different tasks and these works have achieved state-of-the-art performance. However, currently, a comprehensive and systematic study on evaluating different program representation techniques across diverse tasks is still missed. From this starting point, in this paper, we conduct an empirical study to evaluate different program representation techniques. Specifically, we categorize current mainstream code representation techniques into four categories i.e., Feature-based, Sequence-based, Tree-based, and Graph-based program representation technique and evaluate its performance on three diverse and popular code intelligent tasks i.e., {Code Classification}, Vulnerability Detection, and Clone Detection on the public released benchmark. We further design three {research questions (RQs)} and conduct a comprehensive analysis to investigate the performance. By the extensive experimental results, we conclude that (1) The graph-based representation is superior to the other selected techniques across these tasks. (2) Compared with the node type information used in tree-based and graph-based representations, the node textual information is more critical to learning the program semantics. (3) Different tasks require the task-specific semantics to achieve their highest performance, however combining various program semantics from different dimensions such as control dependency, data dependency can still produce promising results.

Related papers

Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective [93.56647950778357]
Blind image quality assessment (BIQA) predicts the human perception of image quality without any reference information. We develop a general and automated multitask learning scheme for BIQA to exploit auxiliary knowledge from other tasks.
arXiv Detail & Related papers (2023-03-27T07:58:09Z)
Benchmarking Node Outlier Detection on Graphs [90.29966986023403]
Graph outlier detection is an emerging but crucial machine learning task with numerous applications. We present the first comprehensive unsupervised node outlier detection benchmark for graphs called UNOD.
arXiv Detail & Related papers (2022-06-21T01:46:38Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning. We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z)
A Comprehensive Analytical Survey on Unsupervised and Semi-Supervised Graph Representation Learning Methods [4.486285347896372]
This survey aims to evaluate all major classes of graph embedding methods. We organized graph embedding techniques using a taxonomy that includes methods from manual feature engineering, matrix factorization, shallow neural networks, and deep graph convolutional networks. We designed experiments on top of PyTorch Geometric and DGL libraries and run experiments on different multicore CPU and GPU platforms.
arXiv Detail & Related papers (2021-12-20T07:50:26Z)
On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical Study [4.049850026698639]
We modify an AST path-based approach to accept multiple representations as input to an attention-based model. We evaluate our approach on three tasks: Method Naming, Program Classification, and Clone Detection.
arXiv Detail & Related papers (2021-06-21T08:36:38Z)
Mining Program Properties From Neural Networks Trained on Source Code Embeddings [0.0]
We propose a novel approach for mining different program features by analysing the internal behaviour of a deep neural network trained on source code. We train an autoencoder for each program embedding and then we test the emerging ability of the internal neurons in autonomously building internal representations for different program features.
arXiv Detail & Related papers (2021-03-09T14:25:16Z)
Comparative Code Structure Analysis using Deep Learning for Performance Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure. Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z)
BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration [72.88493072196094]
We present a new synthesis approach that leverages learning to guide a bottom-up search over programs. In particular, we train a model to prioritize compositions of intermediate values during search conditioned on a set of input-output examples. We show that the combination of learning and bottom-up search is remarkably effective, even with simple supervised learning approaches.
arXiv Detail & Related papers (2020-07-28T17:46:18Z)
Learning Differentiable Programs with Admissible Neural Heuristics [43.54820901841979]
We study the problem of learning differentiable functions expressed as programs in a domain-specific language. We frame this optimization problem as a search in a weighted graph whose paths encode top-down derivations of program syntax. Our key innovation is to view various classes of neural networks as continuous relaxations over the space of programs.
arXiv Detail & Related papers (2020-07-23T16:07:39Z)
Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint. We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.