Related papers: Identifying Non-Control Security-Critical Data through Program Dependence Learning

Identifying Non-Control Security-Critical Data through Program Dependence Learning

URL: http://arxiv.org/abs/2108.12071v2
Date: Wed, 1 May 2024 19:20:52 GMT
Title: Identifying Non-Control Security-Critical Data through Program Dependence Learning
Authors: Zhilong Wang, Haizhou Wang, Hong Hu, Peng Liu,
Abstract summary: In data-oriented attacks, a fundamental step is to identify non-control, security-critical data. We propose a novel approach that combines traditional program analysis with deep learning. The toolchain uncovers 80 potential critical variables in Google FuzzBench.
Score: 9.764831771725952
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As control-flow protection gets widely deployed, it is difficult for attackers to corrupt control-data and achieve control-flow hijacking. Instead, data-oriented attacks, which manipulate non-control data, have been demonstrated to be feasible and powerful. In data-oriented attacks, a fundamental step is to identify non-control, security-critical data. However, critical data identification processes are not scalable in previous works, because they mainly rely on tedious human efforts to identify critical data. To address this issue, we propose a novel approach that combines traditional program analysis with deep learning. At a higher level, by examining how analysts identify critical data, we first propose dynamic analysis algorithms to identify the program semantics (and features) that are correlated with the impact of a critical data. Then, motivated by the unique challenges in the critical data identification task, we formalize the distinguishing features and use customized program dependence graphs (PDG) to embed the features. Different from previous works using deep learning to learn basic program semantics, this paper adopts a special neural network architecture that can capture the long dependency paths (in the PDG), through which a critical variable propagates its impact. We have implemented a fully-automatic toolchain and conducted comprehensive evaluations. According to the evaluations, our model can achieve 90% accuracy. The toolchain uncovers 80 potential critical variables in Google FuzzBench. In addition, we demonstrate the harmfulness of the exploits using the identified critical variables by simulating 7 data-oriented attacks through GDB.

Related papers

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [94.13848736705575]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms. We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels. Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z)
DFEPT: Data Flow Embedding for Enhancing Pre-Trained Model Based Vulnerability Detection [7.802093464108404]
We propose a data flow embedding technique to enhance the performance of pre-trained models in vulnerability detection tasks. Specifically, we parse data flow graphs from function-level source code, and use the data type of the variable as the node characteristics of the DFG. Our research shows that DFEPT can provide effective vulnerability semantic information to pre-trained models, achieving an accuracy of 64.97% on the Devign dataset and an F1-Score of 47.9% on the Reveal dataset.
arXiv Detail & Related papers (2024-10-24T07:05:07Z)
Data Quality Issues in Vulnerability Detection Datasets [1.6114012813668932]
Vulnerability detection is a crucial yet challenging task to identify potential weaknesses in software for cyber security. Deep learning (DL) has made great progress in automating the detection process. Many datasets have been created to train DL models for this purpose. However, these datasets suffer from several issues that will lead to low detection accuracy of DL models.
arXiv Detail & Related papers (2024-10-08T13:31:29Z)
Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection [9.652886240532741]
This paper thoroughly analyses large language models' capabilities in detecting vulnerabilities within source code. We evaluate the performance of six open-source models that are specifically trained for vulnerability detection against six general-purpose LLMs.
arXiv Detail & Related papers (2024-08-29T10:00:57Z)
Footprints of Data in a Classifier: Understanding the Privacy Risks and Solution Strategies [0.9208007322096533]
Article 17 of the General Data Protection Regulation (Right Erasure) requires data to be permanently removed from a system to prevent potential compromise. One such issue arises from the residual footprints of training data embedded within predictive models. This study examines how two fundamental aspects of classifier systems - training quality and classifier training methodology - contribute to privacy vulnerabilities.
arXiv Detail & Related papers (2024-07-02T13:56:37Z)
MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective [10.009178591853058]
We propose a formal information-theoretic definition for this utility-preserving privacy protection problem. We design a data-driven learnable data transformation framework that is capable of suppressing sensitive attributes from target datasets. Results demonstrate the effectiveness and generalizability of our method under various configurations.
arXiv Detail & Related papers (2024-05-23T18:35:46Z)
Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks [12.016029378106131]
This work proposes two novel pre-training objectives, namely Control Dependency Prediction (CDP) and Data Dependency Prediction (DDP) CDP and DDP aim to predict the statement-level control dependencies and token-level data dependencies, respectively, in a code snippet only based on its source code. After pre-training, CDP and DDP can boost the understanding of vulnerable code during fine-tuning and can directly be used to perform dependence analysis for both partial and complete functions.
arXiv Detail & Related papers (2024-02-01T15:18:19Z)
Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning. We first review methods for generating privacy-preserving graph data. Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data. We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z)
Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs) GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features. It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.