Faithful Model Explanations through Energy-Constrained Conformal
Counterfactuals
- URL: http://arxiv.org/abs/2312.10648v1
- Date: Sun, 17 Dec 2023 08:24:44 GMT
- Title: Faithful Model Explanations through Energy-Constrained Conformal
Counterfactuals
- Authors: Patrick Altmeyer, Mojtaba Farmanbar, Arie van Deursen, Cynthia C. S.
Liem
- Abstract summary: Counterfactual explanations offer an intuitive and straightforward way to explain black-box models.
Existing work has primarily relied on surrogate models to learn how the input data is distributed.
We propose a novel algorithmic framework for generating Energy-Constrained Conformal Counterfactuals that are only as plausible as the model permits.
- Score: 16.67633872254042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual explanations offer an intuitive and straightforward way to
explain black-box models and offer algorithmic recourse to individuals. To
address the need for plausible explanations, existing work has primarily relied
on surrogate models to learn how the input data is distributed. This
effectively reallocates the task of learning realistic explanations for the
data from the model itself to the surrogate. Consequently, the generated
explanations may seem plausible to humans but need not necessarily describe the
behaviour of the black-box model faithfully. We formalise this notion of
faithfulness through the introduction of a tailored evaluation metric and
propose a novel algorithmic framework for generating Energy-Constrained
Conformal Counterfactuals that are only as plausible as the model permits.
Through extensive empirical studies, we demonstrate that ECCCo reconciles the
need for faithfulness and plausibility. In particular, we show that for models
with gradient access, it is possible to achieve state-of-the-art performance
without the need for surrogate models. To do so, our framework relies solely on
properties defining the black-box model itself by leveraging recent advances in
energy-based modelling and conformal prediction. To our knowledge, this is the
first venture in this direction for generating faithful counterfactual
explanations. Thus, we anticipate that ECCCo can serve as a baseline for future
research. We believe that our work opens avenues for researchers and
practitioners seeking tools to better distinguish trustworthy from unreliable
models.
Related papers
- Fast Explainability via Feasible Concept Sets Generator [7.011763596804071]
We bridge the gap between the universality of model-agnostic approaches and the efficiency of model-specific approaches.
We first define explanations through a set of human-comprehensible concepts.
Second, we show that a minimal feasible set generator can be learned as a companion explainer to the prediction model.
arXiv Detail & Related papers (2024-05-29T00:01:40Z) - Discriminative Feature Attributions: Bridging Post Hoc Explainability
and Inherent Interpretability [29.459228981179674]
Post hoc explanations incorrectly attribute high importance to features that are unimportant or non-discriminative for the underlying task.
Inherently interpretable models, on the other hand, circumvent these issues by explicitly encoding explanations into model architecture.
We propose Distractor Erasure Tuning (DiET), a method that adapts black-box models to be robust to distractor erasure.
arXiv Detail & Related papers (2023-07-27T17:06:02Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - ExSum: From Local Explanations to Model Understanding [6.23934576145261]
Interpretability methods are developed to understand the working mechanisms of black-box models.
Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them.
We introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding.
arXiv Detail & Related papers (2022-04-30T02:07:20Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z) - Explainable Deep Modeling of Tabular Data using TableGraphNet [1.376408511310322]
We propose a new architecture that produces explainable predictions in the form of additive feature attributions.
We show that our explainable model attains the same level of performance as black box models.
arXiv Detail & Related papers (2020-02-12T20:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.