Related papers: EG-Booster: Explanation-Guided Booster of ML Evasion Attacks

EG-Booster: Explanation-Guided Booster of ML Evasion Attacks

URL: http://arxiv.org/abs/2108.13930v2
Date: Thu, 2 Sep 2021 16:52:02 GMT
Title: EG-Booster: Explanation-Guided Booster of ML Evasion Attacks
Authors: Abderrahmen Amich and Birhanu Eshete
Abstract summary: We present a novel approach called EG-Booster that leverages techniques from explainable ML to guide adversarial example crafting. EG-Booster is agnostic to model architecture, threat model, and supports diverse distance metrics used previously in the literature. Our findings suggest that EG-Booster significantly improves evasion rate of state-of-the-art attacks while performing less number of perturbations.
Score: 3.822543555265593
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The widespread usage of machine learning (ML) in a myriad of domains has raised questions about its trustworthiness in security-critical environments. Part of the quest for trustworthy ML is robustness evaluation of ML models to test-time adversarial examples. Inline with the trustworthy ML goal, a useful input to potentially aid robustness evaluation is feature-based explanations of model predictions. In this paper, we present a novel approach called EG-Booster that leverages techniques from explainable ML to guide adversarial example crafting for improved robustness evaluation of ML models before deploying them in security-critical settings. The key insight in EG-Booster is the use of feature-based explanations of model predictions to guide adversarial example crafting by adding consequential perturbations likely to result in model evasion and avoiding non-consequential ones unlikely to contribute to evasion. EG-Booster is agnostic to model architecture, threat model, and supports diverse distance metrics used previously in the literature. We evaluate EG-Booster using image classification benchmark datasets, MNIST and CIFAR10. Our findings suggest that EG-Booster significantly improves evasion rate of state-of-the-art attacks while performing less number of perturbations. Through extensive experiments that covers four white-box and three black-box attacks, we demonstrate the effectiveness of EG-Booster against two undefended neural networks trained on MNIST and CIFAR10, and another adversarially-trained ResNet model trained on CIFAR10. Furthermore, we introduce a stability assessment metric and evaluate the reliability of our explanation-based approach by observing the similarity between the model's classification outputs across multiple runs of EG-Booster.

Related papers

Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM) To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z)
Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models [21.01313168005792]
We reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) models when faced with black-box attacks for opinion manipulation. We explore the impact of such attacks on user cognition and decision-making.
arXiv Detail & Related papers (2024-07-18T17:55:55Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making. We present a process-based benchmark MR-Ben that demands a meta-reasoning skill. Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Knowledge Distillation-Based Model Extraction Attack using GAN-based Private Counterfactual Explanations [1.6576983459630268]
We focus on investigating how model explanations, particularly counterfactual explanations, can be exploited for performing MEA within the ML platform. We propose a novel approach for MEA based on Knowledge Distillation (KD) to enhance the efficiency of extracting a substitute model. We also assess the effectiveness of differential privacy (DP) as a mitigation strategy.
arXiv Detail & Related papers (2024-04-04T10:28:55Z)
Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention. An increasing number of transfer-based methods have been developed to fool black-box DNN models. We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z)
Semantic Image Attack for Visual Model Diagnosis [80.36063332820568]
In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models. This paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images.
arXiv Detail & Related papers (2023-03-23T03:13:04Z)
Unifying Model Explainability and Robustness for Joint Text Classification and Rationale Extraction [11.878012909876713]
We propose a joint classification and rationale extraction model named AT-BMC. It includes two key mechanisms: mixed Adversarial Training (AT) is designed to use various perturbations in discrete and embedding space to improve the model's robustness, and Boundary Match Constraint (BMC) helps to locate rationales more precisely with the guidance of boundary information. Performances on benchmark datasets demonstrate that the proposed AT-BMC outperforms baselines on both classification and rationale extraction by a large margin.
arXiv Detail & Related papers (2021-12-20T09:48:32Z)
Explanation-Guided Diagnosis of Machine Learning Evasion Attacks [3.822543555265593]
We introduce a novel framework that harnesses explainable ML methods to guide high-fidelity assessment of ML evasion attacks. Our framework enables explanation-guided correlation analysis between pre-evasion perturbations and post-evasion explanations.
arXiv Detail & Related papers (2021-06-30T05:47:12Z)
Boosting Black-Box Attack with Partially Transferred Conditional Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs) We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases. Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)
Providing reliability in Recommender Systems through Bernoulli Matrix Factorization [63.732639864601914]
This paper proposes Bernoulli Matrix Factorization (BeMF) to provide both prediction values and reliability values. BeMF acts on model-based collaborative filtering rather than on memory-based filtering. The more reliable a prediction is, the less liable it is to be wrong.
arXiv Detail & Related papers (2020-06-05T14:24:27Z)
Luring of transferable adversarial perturbations in the black-box paradigm [0.0]
We present a new approach to improve the robustness of a model against black-box transfer attacks. A removable additional neural network is included in the target model, and is designed to induce the textitluring effect. Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set.
arXiv Detail & Related papers (2020-04-10T06:48:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.