EG-Booster: Explanation-Guided Booster of ML Evasion Attacks
- URL: http://arxiv.org/abs/2108.13930v2
- Date: Thu, 2 Sep 2021 16:52:02 GMT
- Title: EG-Booster: Explanation-Guided Booster of ML Evasion Attacks
- Authors: Abderrahmen Amich and Birhanu Eshete
- Abstract summary: We present a novel approach called EG-Booster that leverages techniques from explainable ML to guide adversarial example crafting.
EG-Booster is agnostic to model architecture, threat model, and supports diverse distance metrics used previously in the literature.
Our findings suggest that EG-Booster significantly improves evasion rate of state-of-the-art attacks while performing less number of perturbations.
- Score: 3.822543555265593
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The widespread usage of machine learning (ML) in a myriad of domains has
raised questions about its trustworthiness in security-critical environments.
Part of the quest for trustworthy ML is robustness evaluation of ML models to
test-time adversarial examples. Inline with the trustworthy ML goal, a useful
input to potentially aid robustness evaluation is feature-based explanations of
model predictions. In this paper, we present a novel approach called EG-Booster
that leverages techniques from explainable ML to guide adversarial example
crafting for improved robustness evaluation of ML models before deploying them
in security-critical settings. The key insight in EG-Booster is the use of
feature-based explanations of model predictions to guide adversarial example
crafting by adding consequential perturbations likely to result in model
evasion and avoiding non-consequential ones unlikely to contribute to evasion.
EG-Booster is agnostic to model architecture, threat model, and supports
diverse distance metrics used previously in the literature. We evaluate
EG-Booster using image classification benchmark datasets, MNIST and CIFAR10.
Our findings suggest that EG-Booster significantly improves evasion rate of
state-of-the-art attacks while performing less number of perturbations. Through
extensive experiments that covers four white-box and three black-box attacks,
we demonstrate the effectiveness of EG-Booster against two undefended neural
networks trained on MNIST and CIFAR10, and another adversarially-trained ResNet
model trained on CIFAR10. Furthermore, we introduce a stability assessment
metric and evaluate the reliability of our explanation-based approach by
observing the similarity between the model's classification outputs across
multiple runs of EG-Booster.
Related papers
- Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models [21.01313168005792]
We reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) models when faced with black-box attacks for opinion manipulation.
We explore the impact of such attacks on user cognition and decision-making.
arXiv Detail & Related papers (2024-07-18T17:55:55Z) - Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving [55.93813178692077]
We present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms.
We assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction.
Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data.
arXiv Detail & Related papers (2024-05-27T17:59:39Z) - Knowledge Distillation-Based Model Extraction Attack using Private Counterfactual Explanations [1.6576983459630268]
We focus on investigating how model explanations, particularly Generative networks (GANs)-based counterfactual explanations (CFs) can be exploited for performing model extraction attacks (MEA)
We propose a novel MEA methodology based on Knowledge Distillation (KD) to enhance the efficiency of extracting a substitute model of a target model exploiting CFs.
Our findings reveal that the inclusion of privacy layer impacts the performance of the explainer, the quality of CFs, and results in a reduction in the MEA performance.
arXiv Detail & Related papers (2024-04-04T10:28:55Z) - Model Stealing Attack against Graph Classification with Authenticity,
Uncertainty and Diversity [85.1927483219819]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - Towards Evaluating Transfer-based Attacks Systematically, Practically,
and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention.
An increasing number of transfer-based methods have been developed to fool black-box DNN models.
We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z) - Semantic Image Attack for Visual Model Diagnosis [80.36063332820568]
In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models.
This paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images.
arXiv Detail & Related papers (2023-03-23T03:13:04Z) - Unifying Model Explainability and Robustness for Joint Text
Classification and Rationale Extraction [11.878012909876713]
We propose a joint classification and rationale extraction model named AT-BMC.
It includes two key mechanisms: mixed Adversarial Training (AT) is designed to use various perturbations in discrete and embedding space to improve the model's robustness, and Boundary Match Constraint (BMC) helps to locate rationales more precisely with the guidance of boundary information.
Performances on benchmark datasets demonstrate that the proposed AT-BMC outperforms baselines on both classification and rationale extraction by a large margin.
arXiv Detail & Related papers (2021-12-20T09:48:32Z) - Explanation-Guided Diagnosis of Machine Learning Evasion Attacks [3.822543555265593]
We introduce a novel framework that harnesses explainable ML methods to guide high-fidelity assessment of ML evasion attacks.
Our framework enables explanation-guided correlation analysis between pre-evasion perturbations and post-evasion explanations.
arXiv Detail & Related papers (2021-06-30T05:47:12Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z) - Providing reliability in Recommender Systems through Bernoulli Matrix
Factorization [63.732639864601914]
This paper proposes Bernoulli Matrix Factorization (BeMF) to provide both prediction values and reliability values.
BeMF acts on model-based collaborative filtering rather than on memory-based filtering.
The more reliable a prediction is, the less liable it is to be wrong.
arXiv Detail & Related papers (2020-06-05T14:24:27Z) - Luring of transferable adversarial perturbations in the black-box
paradigm [0.0]
We present a new approach to improve the robustness of a model against black-box transfer attacks.
A removable additional neural network is included in the target model, and is designed to induce the textitluring effect.
Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set.
arXiv Detail & Related papers (2020-04-10T06:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.