On the overlooked issue of defining explanation objectives for
local-surrogate explainers
- URL: http://arxiv.org/abs/2106.05810v1
- Date: Thu, 10 Jun 2021 15:24:49 GMT
- Title: On the overlooked issue of defining explanation objectives for
local-surrogate explainers
- Authors: Rafael Poyiadzi, Xavier Renard, Thibault Laugel, Raul
Santos-Rodriguez, Marcin Detyniecki
- Abstract summary: Local surrogate approaches for explaining machine learning model predictions have appealing properties.
Several methods exist that fit this description and share this goal.
We discuss the implications of the lack of agreement, and clarity, amongst the methods' objectives on the research and practice of explainability.
- Score: 5.094061357656677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Local surrogate approaches for explaining machine learning model predictions
have appealing properties, such as being model-agnostic and flexible in their
modelling. Several methods exist that fit this description and share this goal.
However, despite their shared overall procedure, they set out different
objectives, extract different information from the black-box, and consequently
produce diverse explanations, that are -- in general -- incomparable. In this
work we review the similarities and differences amongst multiple methods, with
a particular focus on what information they extract from the model, as this has
large impact on the output: the explanation. We discuss the implications of the
lack of agreement, and clarity, amongst the methods' objectives on the research
and practice of explainability.
Related papers
- CNN-based explanation ensembling for dataset, representation and explanations evaluation [1.1060425537315088]
We explore the potential of ensembling explanations generated by deep classification models using convolutional model.
Through experimentation and analysis, we aim to investigate the implications of combining explanations to uncover a more coherent and reliable patterns of the model's behavior.
arXiv Detail & Related papers (2024-04-16T08:39:29Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - ExSum: From Local Explanations to Model Understanding [6.23934576145261]
Interpretability methods are developed to understand the working mechanisms of black-box models.
Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them.
We introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding.
arXiv Detail & Related papers (2022-04-30T02:07:20Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective [36.434727068776965]
We study the disagreement problem in explainable machine learning.
We first conduct interviews with data scientists to understand what constitutes disagreement between explanations.
We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets.
arXiv Detail & Related papers (2022-02-03T14:19:23Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - Explaining by Removing: A Unified Framework for Model Explanation [14.50261153230204]
Removal-based explanations are based on the principle of simulating feature removal to quantify each feature's influence.
We develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence.
This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature.
arXiv Detail & Related papers (2020-11-21T00:47:48Z) - Towards Interpretable Reasoning over Paragraph Effects in Situation [126.65672196760345]
We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect.
We propose a sequential approach for this task which explicitly models each step of the reasoning process with neural network modules.
In particular, five reasoning modules are designed and learned in an end-to-end manner, which leads to a more interpretable model.
arXiv Detail & Related papers (2020-10-03T04:03:52Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.