Integrating Prior Knowledge in Post-hoc Explanations
- URL: http://arxiv.org/abs/2204.11634v1
- Date: Mon, 25 Apr 2022 13:09:53 GMT
- Title: Integrating Prior Knowledge in Post-hoc Explanations
- Authors: Adulam Jeyasothy and Thibault Laugel and Marie-Jeanne Lesot and
Christophe Marsala and Marcin Detyniecki
- Abstract summary: Post-hoc interpretability methods aim at explaining to a user the predictions of a trained decision model.
We propose to define a cost function that explicitly integrates prior knowledge into the interpretability objectives.
We propose a new interpretability method called Knowledge Integration in Counterfactual Explanation (KICE) to optimize it.
- Score: 3.6066164404432883
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of eXplainable Artificial Intelligence (XAI), post-hoc
interpretability methods aim at explaining to a user the predictions of a
trained decision model. Integrating prior knowledge into such interpretability
methods aims at improving the explanation understandability and allowing for
personalised explanations adapted to each user. In this paper, we propose to
define a cost function that explicitly integrates prior knowledge into the
interpretability objectives: we present a general framework for the
optimization problem of post-hoc interpretability methods, and show that user
knowledge can thus be integrated to any method by adding a compatibility term
in the cost function. We instantiate the proposed formalization in the case of
counterfactual explanations and propose a new interpretability method called
Knowledge Integration in Counterfactual Explanation (KICE) to optimize it. The
paper performs an experimental study on several benchmark data sets to
characterize the counterfactual instances generated by KICE, as compared to
reference methods.
Related papers
- On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning.
We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.
We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - In-Context Editing: Learning Knowledge from Self-Induced Distributions [29.10148782152867]
We introduce Consistent In-Context Editing (ICE) to optimize toward a contextual distribution rather than a one-hot target.
ICE enhances the robustness and effectiveness of gradient-based tuning methods, preventing overfitting and preserving the model's integrity.
We analyze ICE across four critical aspects of knowledge editing: accuracy, locality, generalization, and linguistic quality, demonstrating its advantages.
arXiv Detail & Related papers (2024-06-17T04:00:04Z) - Introducing User Feedback-based Counterfactual Explanations (UFCE) [49.1574468325115]
Counterfactual explanations (CEs) have emerged as a viable solution for generating comprehensible explanations in XAI.
UFCE allows for the inclusion of user constraints to determine the smallest modifications in the subset of actionable features.
UFCE outperforms two well-known CE methods in terms of textitproximity, textitsparsity, and textitfeasibility.
arXiv Detail & Related papers (2024-02-26T20:09:44Z) - Predictability and Comprehensibility in Post-Hoc XAI Methods: A
User-Centered Analysis [6.606409729669314]
Post-hoc explainability methods aim to clarify predictions of black-box machine learning models.
We conduct a user study to evaluate comprehensibility and predictability in two widely used tools: LIME and SHAP.
We find that the comprehensibility of SHAP is significantly reduced when explanations are provided for samples near a model's decision boundary.
arXiv Detail & Related papers (2023-09-21T11:54:20Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - A general-purpose method for applying Explainable AI for Anomaly
Detection [6.09170287691728]
The need for explainable AI (XAI) is well established but relatively little has been published outside of the supervised learning paradigm.
This paper focuses on a principled approach to applying explainability and interpretability to the task of unsupervised anomaly detection.
arXiv Detail & Related papers (2022-07-23T17:56:01Z) - AcME -- Accelerated Model-agnostic Explanations: Fast Whitening of the
Machine-Learning Black Box [1.7534486934148554]
interpretability approaches should provide actionable insights without making the users wait.
We propose Accelerated Model-agnostic Explanations (AcME), an interpretability approach that quickly provides feature importance scores both at the global and the local level.
AcME computes feature ranking, but it also provides a what-if analysis tool to assess how changes in features values would affect model predictions.
arXiv Detail & Related papers (2021-12-23T15:18:13Z) - InteL-VAEs: Adding Inductive Biases to Variational Auto-Encoders via
Intermediary Latents [60.785317191131284]
We introduce a simple and effective method for learning VAEs with controllable biases by using an intermediary set of latent variables.
In particular, it allows us to impose desired properties like sparsity or clustering on learned representations.
We show that this, in turn, allows InteL-VAEs to learn both better generative models and representations.
arXiv Detail & Related papers (2021-06-25T16:34:05Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.