AutoPruner: Transformer-Based Call Graph Pruning
- URL: http://arxiv.org/abs/2209.03230v1
- Date: Wed, 7 Sep 2022 15:35:28 GMT
- Title: AutoPruner: Transformer-Based Call Graph Pruning
- Authors: Thanh Le-Cong, Hong Jin Kang, Truong Giang Nguyen, Stefanus Agus
Haryono, David Lo, Xuan-Bach D. Le, Huynh Quyet Thang
- Abstract summary: We present a novel call graph pruning technique, AutoPruner, for eliminating false positives in call graphs via both statistical semantic and structural analysis.
Our empirical evaluation on a benchmark dataset of real-world programs shows that AutoPruner outperforms the state-of-the-art baselines.
- Score: 7.319973664340497
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Constructing a static call graph requires trade-offs between soundness and
precision. Program analysis techniques for constructing call graphs are
unfortunately usually imprecise. To address this problem, researchers have
recently proposed call graph pruning empowered by machine learning to
post-process call graphs constructed by static analysis. A machine learning
model is built to capture information from the call graph by extracting
structural features for use in a random forest classifier. It then removes
edges that are predicted to be false positives. Despite the improvements shown
by machine learning models, they are still limited as they do not consider the
source code semantics and thus often are not able to effectively distinguish
true and false positives. In this paper, we present a novel call graph pruning
technique, AutoPruner, for eliminating false positives in call graphs via both
statistical semantic and structural analysis. Given a call graph constructed by
traditional static analysis tools, AutoPruner takes a Transformer-based
approach to capture the semantic relationships between the caller and callee
functions associated with each edge in the call graph. To do so, AutoPruner
fine-tunes a model of code that was pre-trained on a large corpus to represent
source code based on descriptions of its semantics. Next, the model is used to
extract semantic features from the functions related to each edge in the call
graph. AutoPruner uses these semantic features together with the structural
features extracted from the call graph to classify each edge via a feed-forward
neural network. Our empirical evaluation on a benchmark dataset of real-world
programs shows that AutoPruner outperforms the state-of-the-art baselines,
improving on F-measure by up to 13% in identifying false-positive edges in a
static call graph.
Related papers
- An Automatic Graph Construction Framework based on Large Language Models for Recommendation [49.51799417575638]
We introduce AutoGraph, an automatic graph construction framework based on large language models for recommendation.
LLMs infer the user preference and item knowledge, which is encoded as semantic vectors.
Latent factors are incorporated as extra nodes to link the user/item nodes, resulting in a graph with in-depth global-view semantics.
arXiv Detail & Related papers (2024-12-24T07:51:29Z) - Similarity-aware Positive Instance Sampling for Graph Contrastive
Pre-training [82.68805025636165]
We propose to select positive graph instances directly from existing graphs in the training set.
Our selection is based on certain domain-specific pair-wise similarity measurements.
Besides, we develop an adaptive node-level pre-training method to dynamically mask nodes to distribute them evenly in the graph.
arXiv Detail & Related papers (2022-06-23T20:12:51Z) - Explanation Graph Generation via Pre-trained Language Models: An
Empirical Study with Contrastive Learning [84.35102534158621]
We study pre-trained language models that generate explanation graphs in an end-to-end manner.
We propose simple yet effective ways of graph perturbations via node and edge edit operations.
Our methods lead to significant improvements in both structural and semantic accuracy of explanation graphs.
arXiv Detail & Related papers (2022-04-11T00:58:27Z) - Multi-task Self-distillation for Graph-based Semi-Supervised Learning [6.277952154365413]
We propose a multi-task self-distillation framework that injects self-supervised learning and self-distillation into graph convolutional networks.
First, we formulate a self-supervision pipeline based on pre-text tasks to capture different levels of similarities in graphs.
Second, self-distillation uses soft labels of the model itself as additional supervision.
arXiv Detail & Related papers (2021-12-02T12:43:41Z) - Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching.
The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k)
It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z) - Temporal Graph Network Embedding with Causal Anonymous Walks
Representations [54.05212871508062]
We propose a novel approach for dynamic network representation learning based on Temporal Graph Network.
For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings.
We show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks.
arXiv Detail & Related papers (2021-08-19T15:39:52Z) - Partial Graph Reasoning for Neural Network Regularization [26.793648333908905]
Dropout,as a commonly used regularization technique, disables neuron ac-tivations during network optimization.
We present DropGraph that learns regularization function by constructing a stand-alone graph from the backbone features.
This add-on graph regularizes the network during training and can be completely skipped during inference.
arXiv Detail & Related papers (2021-06-03T12:57:01Z) - Learnable Graph Matching: Incorporating Graph Partitioning with Deep
Feature Learning for Multiple Object Tracking [58.30147362745852]
Data association across frames is at the core of Multiple Object Tracking (MOT) task.
Existing methods mostly ignore the context information among tracklets and intra-frame detections.
We propose a novel learnable graph matching method to address these issues.
arXiv Detail & Related papers (2021-03-30T08:58:45Z) - Line Graph Neural Networks for Link Prediction [71.00689542259052]
We consider the graph link prediction task, which is a classic graph analytical problem with many real-world applications.
In this formalism, a link prediction problem is converted to a graph classification task.
We propose to seek a radically different and novel path by making use of the line graphs in graph theory.
In particular, each node in a line graph corresponds to a unique edge in the original graph. Therefore, link prediction problems in the original graph can be equivalently solved as a node classification problem in its corresponding line graph, instead of a graph classification task.
arXiv Detail & Related papers (2020-10-20T05:54:31Z) - Learning Graph Structure With A Finite-State Automaton Layer [31.028101360041227]
We study the problem of learning to derive abstract relations from the intrinsic graph structure.
We show how to learn these relations end-to-end by relaxing the problem into learning finite-state automata policies.
We demonstrate that this layer can find shortcuts in grid-world graphs and reproduce simple static analyses on Python programs.
arXiv Detail & Related papers (2020-07-09T17:01:34Z) - Non-Parametric Graph Learning for Bayesian Graph Neural Networks [35.88239188555398]
We propose a novel non-parametric graph model for constructing the posterior distribution of graph adjacency matrices.
We demonstrate the advantages of this model in three different problem settings: node classification, link prediction and recommendation.
arXiv Detail & Related papers (2020-06-23T21:10:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.