Adversarial attacks and defenses in explainable artificial intelligence:
A survey
- URL: http://arxiv.org/abs/2306.06123v3
- Date: Tue, 13 Feb 2024 14:36:33 GMT
- Title: Adversarial attacks and defenses in explainable artificial intelligence:
A survey
- Authors: Hubert Baniecki and Przemyslaw Biecek
- Abstract summary: Recent advances in adversarial machine learning (AdvML) highlight the limitations and vulnerabilities of state-of-the-art explanation methods.
This survey provides a comprehensive overview of research concerning adversarial attacks on explanations of machine learning models.
- Score: 11.541601343587917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explainable artificial intelligence (XAI) methods are portrayed as a remedy
for debugging and trusting statistical and deep learning models, as well as
interpreting their predictions. However, recent advances in adversarial machine
learning (AdvML) highlight the limitations and vulnerabilities of
state-of-the-art explanation methods, putting their security and
trustworthiness into question. The possibility of manipulating, fooling or
fairwashing evidence of the model's reasoning has detrimental consequences when
applied in high-stakes decision-making and knowledge discovery. This survey
provides a comprehensive overview of research concerning adversarial attacks on
explanations of machine learning models, as well as fairness metrics. We
introduce a unified notation and taxonomy of methods facilitating a common
ground for researchers and practitioners from the intersecting research fields
of AdvML and XAI. We discuss how to defend against attacks and design robust
interpretation methods. We contribute a list of existing insecurities in XAI
and outline the emerging research directions in adversarial XAI (AdvXAI).
Future work should address improving explanation methods and evaluation
protocols to take into account the reported safety issues.
Related papers
- A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication [15.879482578829489]
Deep generative models have demonstrated impressive performance in various computer vision applications.
These models may be used for malicious purposes, such as misinformation, deception, and copyright violation.
This paper provides a systematic and timely review of research efforts on defenses against AI-generated visual media.
arXiv Detail & Related papers (2024-07-15T09:46:02Z) - Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing [51.524108608250074]
Black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing.
We perform a systematic review to identify the key trends in the field and shed light on novel explainable AI approaches.
We also give a detailed outlook on the challenges and promising research directions.
arXiv Detail & Related papers (2024-02-21T13:19:58Z) - X Hacking: The Threat of Misguided AutoML [2.3011205420794574]
This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as Shap values.
We show how an automated machine learning pipeline can be used to search for 'defensible' models that produce a desired explanation while maintaining superior performance to a common baseline.
arXiv Detail & Related papers (2024-01-16T17:21:33Z) - A Brief Review of Explainable Artificial Intelligence in Healthcare [7.844015105790313]
XAI refers to the techniques and methods for building AI applications.
Model explainability and interpretability are vital successful deployment of AI models in healthcare practices.
arXiv Detail & Related papers (2023-04-04T05:41:57Z) - Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A
Contemporary Survey [114.17568992164303]
Adrial attacks and defenses in machine learning and deep neural network have been gaining significant attention.
This survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and defense techniques.
New avenues of attack are also explored, including search-based, decision-based, drop-based, and physical-world attacks.
arXiv Detail & Related papers (2023-03-11T04:19:31Z) - A Survey on Poisoning Attacks Against Supervised Machine Learning [0.0]
We present a survey paper to cover the most representative papers in poisoning attacks against supervised machine learning models.
We summarize and compare the methodology and limitations of existing literature.
We conclude this paper with potential improvements and future directions to further exploit and prevent poisoning attacks on supervised models.
arXiv Detail & Related papers (2022-02-05T08:02:22Z) - A Review of Adversarial Attack and Defense for Classification Methods [78.50824774203495]
This paper focuses on the generation and guarding of adversarial examples.
It is the hope of the authors that this paper will encourage more statisticians to work on this important and exciting field of generating and defending against adversarial examples.
arXiv Detail & Related papers (2021-11-18T22:13:43Z) - When and How to Fool Explainable Models (and Humans) with Adversarial
Examples [1.439518478021091]
We explore the possibilities and limits of adversarial attacks for explainable machine learning models.
First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios.
Next, we propose a comprehensive framework to study whether adversarial examples can be generated for explainable models.
arXiv Detail & Related papers (2021-07-05T11:20:55Z) - Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome.
Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations.
We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.