Counterfactual Explanations Can Be Manipulated
- URL: http://arxiv.org/abs/2106.02666v1
- Date: Fri, 4 Jun 2021 18:56:15 GMT
- Title: Counterfactual Explanations Can Be Manipulated
- Authors: Dylan Slack and Sophie Hilgard and Himabindu Lakkaraju and Sameer
Singh
- Abstract summary: We introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated.
We show counterfactual explanations may converge to drastically different counterfactuals under a small perturbation indicating they are not robust.
We describe how these models can unfairly provide low-cost recourse for specific subgroups in the data while appearing fair to auditors.
- Score: 40.78019510022835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual explanations are emerging as an attractive option for
providing recourse to individuals adversely impacted by algorithmic decisions.
As they are deployed in critical applications (e.g. law enforcement, financial
lending), it becomes important to ensure that we clearly understand the
vulnerabilities of these methods and find ways to address them. However, there
is little understanding of the vulnerabilities and shortcomings of
counterfactual explanations. In this work, we introduce the first framework
that describes the vulnerabilities of counterfactual explanations and shows how
they can be manipulated. More specifically, we show counterfactual explanations
may converge to drastically different counterfactuals under a small
perturbation indicating they are not robust. Leveraging this insight, we
introduce a novel objective to train seemingly fair models where counterfactual
explanations find much lower cost recourse under a slight perturbation. We
describe how these models can unfairly provide low-cost recourse for specific
subgroups in the data while appearing fair to auditors. We perform experiments
on loan and violent crime prediction data sets where certain subgroups achieve
up to 20x lower cost recourse under the perturbation. These results raise
concerns regarding the dependability of current counterfactual explanation
techniques, which we hope will inspire investigations in robust counterfactual
explanations.
Related papers
- Explainable bank failure prediction models: Counterfactual explanations to reduce the failure risk [0.0]
The accuracy and understandability of bank failure prediction models are crucial.
Complex models like random forest, support vector machines, and deep learning offer higher predictive performance but lower explainability.
To address this challenge, using counterfactual explanations is suggested.
arXiv Detail & Related papers (2024-07-14T15:27:27Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - The privacy issue of counterfactual explanations: explanation linkage
attacks [0.0]
We introduce the explanation linkage attack, which can occur when deploying instance-based strategies to find counterfactual explanations.
To counter such an attack, we propose k-anonymous counterfactual explanations and introduce pureness as a new metric to evaluate the validity of these k-anonymous counterfactual explanations.
Our results show that making the explanations, rather than the whole dataset, k- anonymous, is beneficial for the quality of the explanations.
arXiv Detail & Related papers (2022-10-21T15:44:19Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - Counterfactual Explanations for Predictive Business Process Monitoring [0.90238471756546]
We propose LORELEY, a counterfactual explanation technique for predictive process monitoring.
LORELEY can approximate prediction models with an average fidelity of 97.69% and generate realistic counterfactual explanations.
arXiv Detail & Related papers (2022-02-24T11:01:20Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - Disambiguation of weak supervision with exponential convergence rates [88.99819200562784]
In supervised learning, data are annotated with incomplete yet discriminative information.
In this paper, we focus on partial labelling, an instance of weak supervision where, from a given input, we are given a set of potential targets.
We propose an empirical disambiguation algorithm to recover full supervision from weak supervision.
arXiv Detail & Related papers (2021-02-04T18:14:32Z) - A Series of Unfortunate Counterfactual Events: the Role of Time in
Counterfactual Explanations [2.0305676256390934]
We show that the literature has neglected the problem of the time dependency of counterfactual explanations.
We argue that, due to their time dependency and because of the provision of recommendations, even feasible, actionable and sparse counterfactual explanations may not be appropriate in real-world applications.
arXiv Detail & Related papers (2020-10-09T17:16:29Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.