Instance Attack:An Explanation-based Vulnerability Analysis Framework
Against DNNs for Malware Detection
- URL: http://arxiv.org/abs/2209.02453v1
- Date: Tue, 6 Sep 2022 12:41:20 GMT
- Title: Instance Attack:An Explanation-based Vulnerability Analysis Framework
Against DNNs for Malware Detection
- Authors: Sun RuiJin, Guo ShiZe, Guo JinHong, Xing ChangYou, Yang LuMing, Guo
Xi, Pan ZhiSong
- Abstract summary: We propose the notion of the instance-based attack.
Our scheme is interpretable and can work in a black-box environment.
Our method operates in black-box settings and the results can be validated with domain knowledge.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) are increasingly being applied in malware
detection and their robustness has been widely debated. Traditionally an
adversarial example generation scheme relies on either detailed model
information (gradient-based methods) or lots of samples to train a surrogate
model, neither of which are available in most scenarios.
We propose the notion of the instance-based attack. Our scheme is
interpretable and can work in a black-box environment. Given a specific binary
example and a malware classifier, we use the data augmentation strategies to
produce enough data from which we can train a simple interpretable model. We
explain the detection model by displaying the weight of different parts of the
specific binary. By analyzing the explanations, we found that the data
subsections play an important role in Windows PE malware detection. We proposed
a new function preserving transformation algorithm that can be applied to data
subsections. By employing the binary-diversification techniques that we
proposed, we eliminated the influence of the most weighted part to generate
adversarial examples. Our algorithm can fool the DNNs in certain cases with a
success rate of nearly 100\%. Our method outperforms the state-of-the-art
method . The most important aspect is that our method operates in black-box
settings and the results can be validated with domain knowledge. Our analysis
model can assist people in improving the robustness of malware detectors.
Related papers
- HOLMES: to Detect Adversarial Examples with Multiple Detectors [1.455585466338228]
HOLMES is able to distinguish textitunseen adversarial examples from multiple attacks with high accuracy and low false positive rates.
Our effective and inexpensive strategies neither modify original DNN models nor require its internal parameters.
arXiv Detail & Related papers (2024-05-30T11:22:55Z) - Microbial Genetic Algorithm-based Black-box Attack against Interpretable
Deep Learning Systems [16.13790238416691]
In white-box environments, interpretable deep learning systems (IDLSes) have been shown to be vulnerable to malicious manipulations.
We propose a Query-efficient Score-based black-box attack against IDLSes, QuScore, which requires no knowledge of the target model and its coupled interpretation model.
arXiv Detail & Related papers (2023-07-13T00:08:52Z) - Masked Language Model Based Textual Adversarial Example Detection [14.734863175424797]
Adrial attacks are a serious threat to reliable deployment of machine learning models in safety-critical applications.
We propose a novel textual adversarial example detection method, namely Masked Model-based Detection (MLMD)
arXiv Detail & Related papers (2023-04-18T06:52:14Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model.
We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them.
Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z) - Unsupervised Detection of Adversarial Examples with Model Explanations [0.6091702876917279]
We propose a simple yet effective method to detect adversarial examples using methods developed to explain the model's behavior.
Our evaluations with MNIST handwritten dataset show that our method is capable of detecting adversarial examples with high confidence.
arXiv Detail & Related papers (2021-07-22T06:54:18Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Adversarial Examples for Unsupervised Machine Learning Models [71.81480647638529]
Adrial examples causing evasive predictions are widely used to evaluate and improve the robustness of machine learning models.
We propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation.
arXiv Detail & Related papers (2021-03-02T17:47:58Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z) - Adversarial Machine Learning in Network Intrusion Detection Systems [6.18778092044887]
We study the nature of the adversarial problem in Network Intrusion Detection Systems.
We use evolutionary computation (particle swarm optimization and genetic algorithm) and deep learning (generative adversarial networks) as tools for adversarial example generation.
Our work highlights the vulnerability of machine learning based NIDS in the face of adversarial perturbation.
arXiv Detail & Related papers (2020-04-23T19:47:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.