Related papers: When and How to Fool Explainable Models (and Humans) with Adversarial Examples

When and How to Fool Explainable Models (and Humans) with Adversarial Examples

URL: http://arxiv.org/abs/2107.01943v2
Date: Fri, 7 Jul 2023 11:59:57 GMT
Title: When and How to Fool Explainable Models (and Humans) with Adversarial Examples
Authors: Jon Vadillo, Roberto Santana and Jose A. Lozano
Abstract summary: We explore the possibilities and limits of adversarial attacks for explainable machine learning models. First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios. Next, we propose a comprehensive framework to study whether adversarial examples can be generated for explainable models.
Score: 1.439518478021091
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial examples or out-of-distribution inputs. In this exploratory review, we explore the possibilities and limits of adversarial attacks for explainable machine learning models. First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios, in which the inputs, the output classifications and the explanations of the model's decisions are assessed by humans. Next, we propose a comprehensive framework to study whether (and how) adversarial examples can be generated for explainable models under human assessment, introducing and illustrating novel attack paradigms. In particular, our framework considers a wide range of relevant yet often ignored factors such as the type of problem, the user expertise or the objective of the explanations, in order to identify the attack strategies that should be adopted in each scenario to successfully deceive the model (and the human). The intention of these contributions is to serve as a basis for a more rigorous and realistic study of adversarial examples in the field of explainable machine learning.

Related papers

Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions. The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z)
On the Connections between Counterfactual Explanations and Adversarial Examples [14.494463243702908]
We make one of the first attempts at formalizing the connections between counterfactual explanations and adversarial examples. Our analysis demonstrates that several popular counterfactual explanation and adversarial example generation methods are equivalent. We empirically validate our theoretical findings using extensive experimentation with synthetic and real world datasets.
arXiv Detail & Related papers (2021-06-18T08:22:24Z)
Individual Explanations in Machine Learning Models: A Case Study on Poverty Estimation [63.18666008322476]
Machine learning methods are being increasingly applied in sensitive societal contexts. The present case study has two main objectives. First, to expose these challenges and how they affect the use of relevant and novel explanations methods. And second, to present a set of strategies that mitigate such challenges, as faced when implementing explanation methods in a relevant application domain.
arXiv Detail & Related papers (2021-04-09T01:54:58Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
A Hierarchy of Limitations in Machine Learning [0.0]
This paper attempts a comprehensive, structured overview of the specific conceptual, procedural, and statistical limitations of models in machine learning when applied to society. Modelers themselves can use the described hierarchy to identify possible failure points and think through how to address them. Consumers of machine learning models can know what to question when confronted with the decision about if, where, and how to apply machine learning.
arXiv Detail & Related papers (2020-02-12T19:39:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.