The privacy issue of counterfactual explanations: explanation linkage
attacks
- URL: http://arxiv.org/abs/2210.12051v1
- Date: Fri, 21 Oct 2022 15:44:19 GMT
- Title: The privacy issue of counterfactual explanations: explanation linkage
attacks
- Authors: Sofie Goethals, Kenneth S\"orensen, David Martens
- Abstract summary: We introduce the explanation linkage attack, which can occur when deploying instance-based strategies to find counterfactual explanations.
To counter such an attack, we propose k-anonymous counterfactual explanations and introduce pureness as a new metric to evaluate the validity of these k-anonymous counterfactual explanations.
Our results show that making the explanations, rather than the whole dataset, k- anonymous, is beneficial for the quality of the explanations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Black-box machine learning models are being used in more and more high-stakes
domains, which creates a growing need for Explainable AI (XAI). Unfortunately,
the use of XAI in machine learning introduces new privacy risks, which
currently remain largely unnoticed. We introduce the explanation linkage
attack, which can occur when deploying instance-based strategies to find
counterfactual explanations. To counter such an attack, we propose k-anonymous
counterfactual explanations and introduce pureness as a new metric to evaluate
the validity of these k-anonymous counterfactual explanations. Our results show
that making the explanations, rather than the whole dataset, k- anonymous, is
beneficial for the quality of the explanations.
Related papers
- Explainable Graph Neural Networks Under Fire [69.15708723429307]
Graph neural networks (GNNs) usually lack interpretability due to their complex computational behavior and the abstract nature of graphs.
Most GNN explanation methods work in a post-hoc manner and provide explanations in the form of a small subset of important edges and/or nodes.
In this paper we demonstrate that these explanations can unfortunately not be trusted, as common GNN explanation methods turn out to be highly susceptible to adversarial perturbations.
arXiv Detail & Related papers (2024-06-10T16:09:16Z) - Incremental XAI: Memorable Understanding of AI with Incremental Explanations [13.460427339680168]
We propose to provide more detailed explanations by leveraging the human cognitive capacity to accumulate knowledge by incrementally receiving more details.
We introduce Incremental XAI to automatically partition explanations for general and atypical instances.
Memorability is improved by reusing base factors and reducing the number of factors shown in atypical cases.
arXiv Detail & Related papers (2024-04-10T04:38:17Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - XRand: Differentially Private Defense against Explanation-Guided Attacks [19.682368614810756]
We introduce a new concept of achieving local differential privacy (LDP) in the explanations.
We show that our mechanism restricts the information that the adversary can learn about the top important features, while maintaining the faithfulness of the explanations.
arXiv Detail & Related papers (2022-12-08T18:23:59Z) - On the amplification of security and privacy risks by post-hoc
explanations in machine learning models [7.564511776742979]
Post-hoc explanation methods that highlight input dimensions according to their importance or relevance to the result also leak information that weakens security and privacy.
We propose novel explanation-guided black-box evasion attacks that lead to 10 times reduction in query count for the same success rate.
We show that the adversarial advantage from explanations can be quantified as a reduction in the total variance of the estimated gradient.
arXiv Detail & Related papers (2022-06-28T13:46:06Z) - CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing
Human Trust in Image Recognition Models [84.32751938563426]
We propose a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN)
In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process.
Our framework generates sequence of explanations in a dialog by mediating the differences between the minds of machine and human user.
arXiv Detail & Related papers (2021-09-03T09:46:20Z) - Exploiting Explanations for Model Inversion Attacks [19.91586648726519]
We study the risk for image-based model inversion attacks with increasing performance to reconstruct private image data from model explanations.
We have developed several multi-modal transposed CNN architectures that achieve significantly higher inversion performance than using the target model prediction only.
These threats highlight the urgent and significant privacy risks of explanations and calls attention for new privacy preservation techniques.
arXiv Detail & Related papers (2021-04-26T15:53:57Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.