EG-Booster: Explanation-Guided Booster of ML Evasion Attacks
- URL: http://arxiv.org/abs/2108.13930v2
- Date: Thu, 2 Sep 2021 16:52:02 GMT
- Title: EG-Booster: Explanation-Guided Booster of ML Evasion Attacks
- Authors: Abderrahmen Amich and Birhanu Eshete
- Abstract summary: We present a novel approach called EG-Booster that leverages techniques from explainable ML to guide adversarial example crafting.
EG-Booster is agnostic to model architecture, threat model, and supports diverse distance metrics used previously in the literature.
Our findings suggest that EG-Booster significantly improves evasion rate of state-of-the-art attacks while performing less number of perturbations.
- Score: 3.822543555265593
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The widespread usage of machine learning (ML) in a myriad of domains has
raised questions about its trustworthiness in security-critical environments.
Part of the quest for trustworthy ML is robustness evaluation of ML models to
test-time adversarial examples. Inline with the trustworthy ML goal, a useful
input to potentially aid robustness evaluation is feature-based explanations of
model predictions. In this paper, we present a novel approach called EG-Booster
that leverages techniques from explainable ML to guide adversarial example
crafting for improved robustness evaluation of ML models before deploying them
in security-critical settings. The key insight in EG-Booster is the use of
feature-based explanations of model predictions to guide adversarial example
crafting by adding consequential perturbations likely to result in model
evasion and avoiding non-consequential ones unlikely to contribute to evasion.
EG-Booster is agnostic to model architecture, threat model, and supports
diverse distance metrics used previously in the literature. We evaluate
EG-Booster using image classification benchmark datasets, MNIST and CIFAR10.
Our findings suggest that EG-Booster significantly improves evasion rate of
state-of-the-art attacks while performing less number of perturbations. Through
extensive experiments that covers four white-box and three black-box attacks,
we demonstrate the effectiveness of EG-Booster against two undefended neural
networks trained on MNIST and CIFAR10, and another adversarially-trained ResNet
model trained on CIFAR10. Furthermore, we introduce a stability assessment
metric and evaluate the reliability of our explanation-based approach by
observing the similarity between the model's classification outputs across
multiple runs of EG-Booster.
Related papers
- Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)
To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z) - Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models [21.01313168005792]
We reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) models when faced with black-box attacks for opinion manipulation.
We explore the impact of such attacks on user cognition and decision-making.
arXiv Detail & Related papers (2024-07-18T17:55:55Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Knowledge Distillation-Based Model Extraction Attack using GAN-based Private Counterfactual Explanations [1.6576983459630268]
We focus on investigating how model explanations, particularly counterfactual explanations, can be exploited for performing MEA within the ML platform.
We propose a novel approach for MEA based on Knowledge Distillation (KD) to enhance the efficiency of extracting a substitute model.
We also assess the effectiveness of differential privacy (DP) as a mitigation strategy.
arXiv Detail & Related papers (2024-04-04T10:28:55Z) - Towards Evaluating Transfer-based Attacks Systematically, Practically,
and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention.
An increasing number of transfer-based methods have been developed to fool black-box DNN models.
We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z) - Semantic Image Attack for Visual Model Diagnosis [80.36063332820568]
In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models.
This paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images.
arXiv Detail & Related papers (2023-03-23T03:13:04Z) - Unifying Model Explainability and Robustness for Joint Text
Classification and Rationale Extraction [11.878012909876713]
We propose a joint classification and rationale extraction model named AT-BMC.
It includes two key mechanisms: mixed Adversarial Training (AT) is designed to use various perturbations in discrete and embedding space to improve the model's robustness, and Boundary Match Constraint (BMC) helps to locate rationales more precisely with the guidance of boundary information.
Performances on benchmark datasets demonstrate that the proposed AT-BMC outperforms baselines on both classification and rationale extraction by a large margin.
arXiv Detail & Related papers (2021-12-20T09:48:32Z) - Explanation-Guided Diagnosis of Machine Learning Evasion Attacks [3.822543555265593]
We introduce a novel framework that harnesses explainable ML methods to guide high-fidelity assessment of ML evasion attacks.
Our framework enables explanation-guided correlation analysis between pre-evasion perturbations and post-evasion explanations.
arXiv Detail & Related papers (2021-06-30T05:47:12Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z) - Providing reliability in Recommender Systems through Bernoulli Matrix
Factorization [63.732639864601914]
This paper proposes Bernoulli Matrix Factorization (BeMF) to provide both prediction values and reliability values.
BeMF acts on model-based collaborative filtering rather than on memory-based filtering.
The more reliable a prediction is, the less liable it is to be wrong.
arXiv Detail & Related papers (2020-06-05T14:24:27Z) - Luring of transferable adversarial perturbations in the black-box
paradigm [0.0]
We present a new approach to improve the robustness of a model against black-box transfer attacks.
A removable additional neural network is included in the target model, and is designed to induce the textitluring effect.
Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set.
arXiv Detail & Related papers (2020-04-10T06:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.