Identifying Non-Control Security-Critical Data through Program Dependence Learning
- URL: http://arxiv.org/abs/2108.12071v2
- Date: Wed, 1 May 2024 19:20:52 GMT
- Title: Identifying Non-Control Security-Critical Data through Program Dependence Learning
- Authors: Zhilong Wang, Haizhou Wang, Hong Hu, Peng Liu,
- Abstract summary: In data-oriented attacks, a fundamental step is to identify non-control, security-critical data.
We propose a novel approach that combines traditional program analysis with deep learning.
The toolchain uncovers 80 potential critical variables in Google FuzzBench.
- Score: 9.764831771725952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As control-flow protection gets widely deployed, it is difficult for attackers to corrupt control-data and achieve control-flow hijacking. Instead, data-oriented attacks, which manipulate non-control data, have been demonstrated to be feasible and powerful. In data-oriented attacks, a fundamental step is to identify non-control, security-critical data. However, critical data identification processes are not scalable in previous works, because they mainly rely on tedious human efforts to identify critical data. To address this issue, we propose a novel approach that combines traditional program analysis with deep learning. At a higher level, by examining how analysts identify critical data, we first propose dynamic analysis algorithms to identify the program semantics (and features) that are correlated with the impact of a critical data. Then, motivated by the unique challenges in the critical data identification task, we formalize the distinguishing features and use customized program dependence graphs (PDG) to embed the features. Different from previous works using deep learning to learn basic program semantics, this paper adopts a special neural network architecture that can capture the long dependency paths (in the PDG), through which a critical variable propagates its impact. We have implemented a fully-automatic toolchain and conducted comprehensive evaluations. According to the evaluations, our model can achieve 90% accuracy. The toolchain uncovers 80 potential critical variables in Google FuzzBench. In addition, we demonstrate the harmfulness of the exploits using the identified critical variables by simulating 7 data-oriented attacks through GDB.
Related papers
- Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective [10.009178591853058]
We propose a formal information-theoretic definition for this utility-preserving privacy protection problem.
We design a data-driven learnable data transformation framework that is capable of suppressing sensitive attributes from target datasets.
Results demonstrate the effectiveness and generalizability of our method under various configurations.
arXiv Detail & Related papers (2024-05-23T18:35:46Z) - Profile of Vulnerability Remediations in Dependencies Using Graph
Analysis [40.35284812745255]
This research introduces graph analysis methods and a modified Graph Attention Convolutional Neural Network (GAT) model.
We analyze control flow graphs to profile breaking changes in applications occurring from dependency upgrades intended to remediate vulnerabilities.
Results demonstrate the effectiveness of the enhanced GAT model in offering nuanced insights into the relational dynamics of code vulnerabilities.
arXiv Detail & Related papers (2024-03-08T02:01:47Z) - Pre-training by Predicting Program Dependencies for Vulnerability
Analysis Tasks [12.016029378106131]
This work proposes two novel pre-training objectives, namely Control Dependency Prediction (CDP) and Data Dependency Prediction (DDP)
CDP and DDP aim to predict the statement-level control dependencies and token-level data dependencies, respectively, in a code snippet only based on its source code.
After pre-training, CDP and DDP can boost the understanding of vulnerable code during fine-tuning and can directly be used to perform dependence analysis for both partial and complete functions.
arXiv Detail & Related papers (2024-02-01T15:18:19Z) - Privacy-Preserving Graph Machine Learning from Data to Computation: A
Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning.
We first review methods for generating privacy-preserving graph data.
Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Enhanced Membership Inference Attacks against Machine Learning Models [9.26208227402571]
Membership inference attacks are used to quantify the private information that a model leaks about the individual data points in its training set.
We derive new attack algorithms that can achieve a high AUC score while also highlighting the different factors that affect their performance.
Our algorithms capture a very precise approximation of privacy loss in models, and can be used as a tool to perform an accurate and informed estimation of privacy risk in machine learning models.
arXiv Detail & Related papers (2021-11-18T13:31:22Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.