Promoting Counterfactual Robustness through Diversity
- URL: http://arxiv.org/abs/2312.06564v2
- Date: Tue, 12 Dec 2023 08:09:34 GMT
- Title: Promoting Counterfactual Robustness through Diversity
- Authors: Francesco Leofante and Nico Potyka
- Abstract summary: Counterfactual explainers may lack robustness in the sense that a minor change in the input can cause a major change in the explanation.
We propose an approximation algorithm that uses a diversity criterion to select a feasible number of most relevant explanations.
- Score: 10.223545393731115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual explanations shed light on the decisions of black-box models
by explaining how an input can be altered to obtain a favourable decision from
the model (e.g., when a loan application has been rejected). However, as noted
recently, counterfactual explainers may lack robustness in the sense that a
minor change in the input can cause a major change in the explanation. This can
cause confusion on the user side and open the door for adversarial attacks. In
this paper, we study some sources of non-robustness. While there are
fundamental reasons for why an explainer that returns a single counterfactual
cannot be robust in all instances, we show that some interesting robustness
guarantees can be given by reporting multiple rather than a single
counterfactual. Unfortunately, the number of counterfactuals that need to be
reported for the theoretical guarantees to hold can be prohibitively large. We
therefore propose an approximation algorithm that uses a diversity criterion to
select a feasible number of most relevant explanations and study its robustness
empirically. Our experiments indicate that our method improves the
state-of-the-art in generating robust explanations, while maintaining other
desirable properties and providing competitive computational performance.
Related papers
- Explainable bank failure prediction models: Counterfactual explanations to reduce the failure risk [0.0]
The accuracy and understandability of bank failure prediction models are crucial.
Complex models like random forest, support vector machines, and deep learning offer higher predictive performance but lower explainability.
To address this challenge, using counterfactual explanations is suggested.
arXiv Detail & Related papers (2024-07-14T15:27:27Z) - Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - From Robustness to Explainability and Back Again [0.685316573653194]
The paper addresses the limitation of scalability of formal explainability, and proposes novel algorithms for computing formal explanations.
The proposed algorithm computes explanations by answering instead a number of robustness queries, and such that the number of such queries is at most linear on the number of features.
The experiments validate the practical efficiency of the proposed approach.
arXiv Detail & Related papers (2023-06-05T17:21:05Z) - Generating robust counterfactual explanations [60.32214822437734]
The quality of a counterfactual depends on several criteria: realism, actionability, validity, robustness, etc.
In this paper, we are interested in the notion of robustness of a counterfactual. More precisely, we focus on robustness to counterfactual input changes.
We propose a new framework, CROCO, that generates robust counterfactuals while managing effectively this trade-off, and guarantees the user a minimal robustness.
arXiv Detail & Related papers (2023-04-24T09:00:31Z) - Feature-based Learning for Diverse and Privacy-Preserving Counterfactual
Explanations [46.89706747651661]
Interpretable machine learning seeks to understand the reasoning process of complex black-box systems.
One flourishing approach is through counterfactual explanations, which provide suggestions on what a user can do to alter an outcome.
arXiv Detail & Related papers (2022-09-27T15:09:13Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.