FuncFooler: A Practical Black-box Attack Against Learning-based Binary
Code Similarity Detection Methods
- URL: http://arxiv.org/abs/2208.14191v1
- Date: Fri, 26 Aug 2022 01:58:26 GMT
- Title: FuncFooler: A Practical Black-box Attack Against Learning-based Binary
Code Similarity Detection Methods
- Authors: Lichen Jia, Bowen Tang, Chenggang Wu, Zhe Wang, Zihan Jiang, Yuanming
Lai, Yan Kang, Ning Liu, Jingfeng Zhang
- Abstract summary: This paper designs an efficient and black-box adversarial code generation algorithm, namely, FuncFooler.
FuncFooler constrains the adversarial codes to keep unchanged the program's control flow graph (CFG), and to preserve the same semantic meaning.
Empirically, our FuncFooler can successfully attack the three learning-based BCSD models, including SAFE, Asm2Vec, and jTrans.
- Score: 13.694322857909166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The binary code similarity detection (BCSD) method measures the similarity of
two binary executable codes. Recently, the learning-based BCSD methods have
achieved great success, outperforming traditional BCSD in detection accuracy
and efficiency. However, the existing studies are rather sparse on the
adversarial vulnerability of the learning-based BCSD methods, which cause
hazards in security-related applications. To evaluate the adversarial
robustness, this paper designs an efficient and black-box adversarial code
generation algorithm, namely, FuncFooler. FuncFooler constrains the adversarial
codes 1) to keep unchanged the program's control flow graph (CFG), and 2) to
preserve the same semantic meaning. Specifically, FuncFooler consecutively 1)
determines vulnerable candidates in the malicious code, 2) chooses and inserts
the adversarial instructions from the benign code, and 3) corrects the semantic
side effect of the adversarial code to meet the constraints. Empirically, our
FuncFooler can successfully attack the three learning-based BCSD models,
including SAFE, Asm2Vec, and jTrans, which calls into question whether the
learning-based BCSD is desirable.
Related papers
- Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity
Detection [23.8834126695488]
Binary code similarity detection (BCSD) is a fundamental technique for various application.
We propose a cost-effective BCSD framework, CEBin, which fuses embedding-based and comparison-based approaches.
arXiv Detail & Related papers (2024-02-29T03:02:07Z) - Certifying LLM Safety against Adversarial Prompting [75.19953634352258]
Large language models (LLMs) are vulnerable to adversarial attacks that add malicious tokens to an input prompt.
We introduce erase-and-check, the first framework for defending against adversarial prompts with certifiable safety guarantees.
arXiv Detail & Related papers (2023-09-06T04:37:20Z) - Adversarial Training Should Be Cast as a Non-Zero-Sum Game [121.95628660889628]
Two-player zero-sum paradigm of adversarial training has not engendered sufficient levels of robustness.
We show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on robustness.
A novel non-zero-sum bilevel formulation of adversarial training yields a framework that matches and in some cases outperforms state-of-the-art attacks.
arXiv Detail & Related papers (2023-06-19T16:00:48Z) - Pre-trained Encoders in Self-Supervised Learning Improve Secure and
Privacy-preserving Supervised Learning [63.45532264721498]
Self-supervised learning is an emerging technique to pre-train encoders using unlabeled data.
We perform first systematic, principled measurement study to understand whether and when a pretrained encoder can address the limitations of secure or privacy-preserving supervised learning algorithms.
arXiv Detail & Related papers (2022-12-06T21:35:35Z) - UniASM: Binary Code Similarity Detection without Fine-tuning [0.8271859911016718]
We propose a novel transformer-based binary code embedding model named UniASM to learn representations of the binary functions.
In the real-world task of known vulnerability search, UniASM outperforms all the current baselines.
arXiv Detail & Related papers (2022-10-28T14:04:57Z) - SimCLF: A Simple Contrastive Learning Framework for Function-level
Binary Embeddings [2.1222884030559315]
We propose SimCLF: A Simple Contrastive Learning Framework for Function-level Binary Embeddings.
We take an unsupervised learning approach and formulate binary code similarity detection as instance discrimination.
SimCLF directly operates on disassembled binary functions and could be implemented with any encoder.
arXiv Detail & Related papers (2022-09-06T12:09:45Z) - On using distributed representations of source code for the detection of
C security vulnerabilities [14.8831988481175]
This paper presents an evaluation of the code representation model Code2vec when trained on the task of detecting security vulnerabilities in C source code.
We leverage the open-source library astminer to extract path-contexts from the abstract syntax trees of a corpus of labeled C functions.
Code2vec is trained on the resulting path-contexts with the task of classifying a function as vulnerable or non-vulnerable.
arXiv Detail & Related papers (2021-06-01T21:18:23Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Stochastic Security: Adversarial Defense Using Long-Run Dynamics of
Energy-Based Models [82.03536496686763]
The vulnerability of deep networks to adversarial attacks is a central problem for deep learning from the perspective of both cognition and security.
We focus on defending naturally-trained classifiers using Markov Chain Monte Carlo (MCMC) sampling with an Energy-Based Model (EBM) for adversarial purification.
Our contributions are 1) an improved method for training EBM's with realistic long-run MCMC samples, 2) Expectation-Over-Transformation (EOT) defense that resolves theoretical ambiguities for defenses, and 3) state-of-the-art adversarial defense for naturally-trained classifiers and competitive defense.
arXiv Detail & Related papers (2020-05-27T17:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.