Instance Attack:An Explanation-based Vulnerability Analysis Framework
Against DNNs for Malware Detection
- URL: http://arxiv.org/abs/2209.02453v1
- Date: Tue, 6 Sep 2022 12:41:20 GMT
- Title: Instance Attack:An Explanation-based Vulnerability Analysis Framework
Against DNNs for Malware Detection
- Authors: Sun RuiJin, Guo ShiZe, Guo JinHong, Xing ChangYou, Yang LuMing, Guo
Xi, Pan ZhiSong
- Abstract summary: We propose the notion of the instance-based attack.
Our scheme is interpretable and can work in a black-box environment.
Our method operates in black-box settings and the results can be validated with domain knowledge.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) are increasingly being applied in malware
detection and their robustness has been widely debated. Traditionally an
adversarial example generation scheme relies on either detailed model
information (gradient-based methods) or lots of samples to train a surrogate
model, neither of which are available in most scenarios.
We propose the notion of the instance-based attack. Our scheme is
interpretable and can work in a black-box environment. Given a specific binary
example and a malware classifier, we use the data augmentation strategies to
produce enough data from which we can train a simple interpretable model. We
explain the detection model by displaying the weight of different parts of the
specific binary. By analyzing the explanations, we found that the data
subsections play an important role in Windows PE malware detection. We proposed
a new function preserving transformation algorithm that can be applied to data
subsections. By employing the binary-diversification techniques that we
proposed, we eliminated the influence of the most weighted part to generate
adversarial examples. Our algorithm can fool the DNNs in certain cases with a
success rate of nearly 100\%. Our method outperforms the state-of-the-art
method . The most important aspect is that our method operates in black-box
settings and the results can be validated with domain knowledge. Our analysis
model can assist people in improving the robustness of malware detectors.
Related papers
- MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations.
We show that strong feature representation learning during training can significantly enhance the original model's robustness.
We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z) - HOLMES: to Detect Adversarial Examples with Multiple Detectors [1.455585466338228]
HOLMES is able to distinguish textitunseen adversarial examples from multiple attacks with high accuracy and low false positive rates.
Our effective and inexpensive strategies neither modify original DNN models nor require its internal parameters.
arXiv Detail & Related papers (2024-05-30T11:22:55Z) - Microbial Genetic Algorithm-based Black-box Attack against Interpretable
Deep Learning Systems [16.13790238416691]
In white-box environments, interpretable deep learning systems (IDLSes) have been shown to be vulnerable to malicious manipulations.
We propose a Query-efficient Score-based black-box attack against IDLSes, QuScore, which requires no knowledge of the target model and its coupled interpretation model.
arXiv Detail & Related papers (2023-07-13T00:08:52Z) - Masked Language Model Based Textual Adversarial Example Detection [14.734863175424797]
Adrial attacks are a serious threat to reliable deployment of machine learning models in safety-critical applications.
We propose a novel textual adversarial example detection method, namely Masked Model-based Detection (MLMD)
arXiv Detail & Related papers (2023-04-18T06:52:14Z) - Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model.
We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them.
Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z) - Unsupervised Detection of Adversarial Examples with Model Explanations [0.6091702876917279]
We propose a simple yet effective method to detect adversarial examples using methods developed to explain the model's behavior.
Our evaluations with MNIST handwritten dataset show that our method is capable of detecting adversarial examples with high confidence.
arXiv Detail & Related papers (2021-07-22T06:54:18Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Adversarial Examples for Unsupervised Machine Learning Models [71.81480647638529]
Adrial examples causing evasive predictions are widely used to evaluate and improve the robustness of machine learning models.
We propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation.
arXiv Detail & Related papers (2021-03-02T17:47:58Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z) - Generalized ODIN: Detecting Out-of-distribution Image without Learning
from Out-of-distribution Data [87.61504710345528]
We propose two strategies for freeing a neural network from tuning with OoD data, while improving its OoD detection performance.
We specifically propose to decompose confidence scoring as well as a modified input pre-processing method.
Our further analysis on a larger scale image dataset shows that the two types of distribution shifts, specifically semantic shift and non-semantic shift, present a significant difference.
arXiv Detail & Related papers (2020-02-26T04:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.