Adversarial attacks and defenses in explainable artificial intelligence:
A survey
- URL: http://arxiv.org/abs/2306.06123v3
- Date: Tue, 13 Feb 2024 14:36:33 GMT
- Title: Adversarial attacks and defenses in explainable artificial intelligence:
A survey
- Authors: Hubert Baniecki and Przemyslaw Biecek
- Abstract summary: Recent advances in adversarial machine learning (AdvML) highlight the limitations and vulnerabilities of state-of-the-art explanation methods.
This survey provides a comprehensive overview of research concerning adversarial attacks on explanations of machine learning models.
- Score: 11.541601343587917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explainable artificial intelligence (XAI) methods are portrayed as a remedy
for debugging and trusting statistical and deep learning models, as well as
interpreting their predictions. However, recent advances in adversarial machine
learning (AdvML) highlight the limitations and vulnerabilities of
state-of-the-art explanation methods, putting their security and
trustworthiness into question. The possibility of manipulating, fooling or
fairwashing evidence of the model's reasoning has detrimental consequences when
applied in high-stakes decision-making and knowledge discovery. This survey
provides a comprehensive overview of research concerning adversarial attacks on
explanations of machine learning models, as well as fairness metrics. We
introduce a unified notation and taxonomy of methods facilitating a common
ground for researchers and practitioners from the intersecting research fields
of AdvML and XAI. We discuss how to defend against attacks and design robust
interpretation methods. We contribute a list of existing insecurities in XAI
and outline the emerging research directions in adversarial XAI (AdvXAI).
Future work should address improving explanation methods and evaluation
protocols to take into account the reported safety issues.
Related papers
- Computational Safety for Generative AI: A Signal Processing Perspective [65.268245109828]
computational safety is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI.
We show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts.
We discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.
arXiv Detail & Related papers (2025-02-18T02:26:50Z) - Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI [9.31572645030282]
In adversarial attacks on explainable AI (XAI) in the NLP domain, the generated explanation is manipulated.
Central to this XAI manipulation is the similarity measure used to calculate how one explanation differs from another.
This work investigates a variety of similarity measures designed for text-based ranked lists to determine their comparative suitability for use.
arXiv Detail & Related papers (2025-01-03T17:44:57Z) - Explainable Artificial Intelligence (XAI) for Malware Analysis: A Survey of Techniques, Applications, and Open Challenges [0.0]
Explainable AI (XAI) addresses this gap by enhancing model interpretability while maintaining strong detection capabilities.
We examine existing XAI frameworks, their application in malware classification and detection, and the challenges associated with making malware detection models more interpretable.
This survey serves as a valuable resource for researchers and practitioners seeking to bridge the gap between ML performance and explainability in cybersecurity.
arXiv Detail & Related papers (2024-09-09T08:19:33Z) - Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing [51.524108608250074]
Black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing.
We perform a systematic review to identify the key trends in the field and shed light on novel explainable AI approaches.
We also give a detailed outlook on the challenges and promising research directions.
arXiv Detail & Related papers (2024-02-21T13:19:58Z) - X Hacking: The Threat of Misguided AutoML [2.3011205420794574]
This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as Shap values.
We show how an automated machine learning pipeline can be used to search for 'defensible' models that produce a desired explanation while maintaining superior performance to a common baseline.
arXiv Detail & Related papers (2024-01-16T17:21:33Z) - A Brief Review of Explainable Artificial Intelligence in Healthcare [7.844015105790313]
XAI refers to the techniques and methods for building AI applications.
Model explainability and interpretability are vital successful deployment of AI models in healthcare practices.
arXiv Detail & Related papers (2023-04-04T05:41:57Z) - Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A
Contemporary Survey [114.17568992164303]
Adrial attacks and defenses in machine learning and deep neural network have been gaining significant attention.
This survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and defense techniques.
New avenues of attack are also explored, including search-based, decision-based, drop-based, and physical-world attacks.
arXiv Detail & Related papers (2023-03-11T04:19:31Z) - A Survey on Poisoning Attacks Against Supervised Machine Learning [0.0]
We present a survey paper to cover the most representative papers in poisoning attacks against supervised machine learning models.
We summarize and compare the methodology and limitations of existing literature.
We conclude this paper with potential improvements and future directions to further exploit and prevent poisoning attacks on supervised models.
arXiv Detail & Related papers (2022-02-05T08:02:22Z) - Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome.
Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations.
We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.