Unfooling Perturbation-Based Post Hoc Explainers
- URL: http://arxiv.org/abs/2205.14772v3
- Date: Wed, 12 Apr 2023 00:13:41 GMT
- Title: Unfooling Perturbation-Based Post Hoc Explainers
- Authors: Zachariah Carmichael, Walter J Scheirer
- Abstract summary: Recent work demonstrates that perturbation-based post hoc explainers can be fooled adversarially.
This discovery has adverse implications for auditors, regulators, and other sentinels.
In this work, we rigorously formalize this problem and devise a defense against adversarial attacks on perturbation-based explainers.
- Score: 12.599362066650842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monumental advancements in artificial intelligence (AI) have lured the
interest of doctors, lenders, judges, and other professionals. While these
high-stakes decision-makers are optimistic about the technology, those familiar
with AI systems are wary about the lack of transparency of its decision-making
processes. Perturbation-based post hoc explainers offer a model agnostic means
of interpreting these systems while only requiring query-level access. However,
recent work demonstrates that these explainers can be fooled adversarially.
This discovery has adverse implications for auditors, regulators, and other
sentinels. With this in mind, several natural questions arise - how can we
audit these black box systems? And how can we ascertain that the auditee is
complying with the audit in good faith? In this work, we rigorously formalize
this problem and devise a defense against adversarial attacks on
perturbation-based explainers. We propose algorithms for the detection
(CAD-Detect) and defense (CAD-Defend) of these attacks, which are aided by our
novel conditional anomaly detection approach, KNN-CAD. We demonstrate that our
approach successfully detects whether a black box system adversarially conceals
its decision-making process and mitigates the adversarial attack on real-world
data for the prevalent explainers, LIME and SHAP.
Related papers
- Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act)
Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence.
As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z) - Counter Denial of Service for Next-Generation Networks within the Artificial Intelligence and Post-Quantum Era [2.156208381257605]
DoS attacks are becoming increasingly sophisticated and easily executable.
State-of-the-art systematization efforts have limitations such as isolated DoS countermeasures.
The emergence of quantum computers is a game changer for DoS from attack and defense perspectives.
arXiv Detail & Related papers (2024-08-08T18:47:31Z) - Painting the black box white: experimental findings from applying XAI to
an ECG reading setting [0.13124513975412253]
The shift from symbolic AI systems to black-box, sub-symbolic, and statistical ones has motivated a rapid increase in the interest toward explainable AI (XAI)
We focus on the cognitive dimension of users' perception of explanations and XAI systems.
arXiv Detail & Related papers (2022-10-27T07:47:50Z) - Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against
Fact-Verification Systems [80.3811072650087]
We show that it is possible to subtly modify claim-salient snippets in the evidence and generate diverse and claim-aligned evidence.
The attacks are also robust against post-hoc modifications of the claim.
These attacks can have harmful implications on the inspectable and human-in-the-loop usage scenarios.
arXiv Detail & Related papers (2022-09-07T13:39:24Z) - Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers.
Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods.
Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z) - Explainable Intrusion Detection Systems (X-IDS): A Survey of Current
Methods, Challenges, and Opportunities [0.0]
Intrusion Detection Systems (IDS) have received widespread adoption due to their ability to handle vast amounts of data with a high prediction accuracy.
IDSs designed using Deep Learning (DL) techniques are often treated as black box models and do not provide a justification for their predictions.
This survey reviews the state-of-the-art in explainable AI (XAI) for IDS, its current challenges, and discusses how these challenges span to the design of an X-IDS.
arXiv Detail & Related papers (2022-07-13T14:31:46Z) - Inter-Domain Fusion for Enhanced Intrusion Detection in Power Systems:
An Evidence Theoretic and Meta-Heuristic Approach [0.0]
False alerts due to/ compromised IDS in ICS networks can lead to severe economic and operational damage.
This work presents an approach for reducing false alerts in CPS power systems by dealing with uncertainty without prior distribution of alerts.
arXiv Detail & Related papers (2021-11-20T00:05:39Z) - Inspect, Understand, Overcome: A Survey of Practical Methods for AI
Safety [54.478842696269304]
The use of deep neural networks (DNNs) in safety-critical applications is challenging due to numerous model-inherent shortcomings.
In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged.
Our paper addresses both machine learning experts and safety engineers.
arXiv Detail & Related papers (2021-04-29T09:54:54Z) - An Empirical Review of Adversarial Defenses [0.913755431537592]
Deep neural networks, which form the basis of such systems, are highly susceptible to a specific type of attack, called adversarial attacks.
A hacker can, even with bare minimum computation, generate adversarial examples (images or data points that belong to another class, but consistently fool the model to get misclassified as genuine) and crumble the basis of such algorithms.
We show two effective techniques, namely Dropout and Denoising Autoencoders, and show their success in preventing such attacks from fooling the model.
arXiv Detail & Related papers (2020-12-10T09:34:41Z) - A black-box adversarial attack for poisoning clustering [78.19784577498031]
We propose a black-box adversarial attack for crafting adversarial samples to test the robustness of clustering algorithms.
We show that our attacks are transferable even against supervised algorithms such as SVMs, random forests, and neural networks.
arXiv Detail & Related papers (2020-09-09T18:19:31Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.