Contrastive Explanations for Model Interpretability
- URL: http://arxiv.org/abs/2103.01378v1
- Date: Tue, 2 Mar 2021 00:36:45 GMT
- Title: Contrastive Explanations for Model Interpretability
- Authors: Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin
Choi, Yoav Goldberg
- Abstract summary: We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
- Score: 77.92370750072831
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive explanations clarify why an event occurred in contrast to
another. They are more inherently intuitive to humans to both produce and
comprehend. We propose a methodology to produce contrastive explanations for
classification models by modifying the representation to disregard
non-contrastive information, and modifying model behavior to only be based on
contrastive reasoning. Our method is based on projecting model representation
to a latent space that captures only the features that are useful (to the
model) to differentiate two potential decisions. We demonstrate the value of
contrastive explanations by analyzing two different scenarios, using both
high-level abstract concept attribution and low-level input token/span
attribution, on two widely used text classification tasks. Specifically, we
produce explanations for answering: for which label, and against which
alternative label, is some aspect of the input useful? And which aspects of the
input are useful for and against particular decisions? Overall, our findings
shed light on the ability of label-contrastive explanations to provide a more
accurate and finer-grained interpretability of a model's decision.
Related papers
- Explanation Selection Using Unlabeled Data for Chain-of-Thought
Prompting [80.9896041501715]
Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance.
This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion.
arXiv Detail & Related papers (2023-02-09T18:02:34Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - VAE-CE: Visual Contrastive Explanation using Disentangled VAEs [3.5027291542274357]
Variational Autoencoder-based Contrastive Explanation (VAE-CE)
We build the model using a disentangled VAE, extended with a new supervised method for disentangling individual dimensions.
An analysis on synthetic data and MNIST shows that the approaches to both disentanglement and explanation provide benefits over other methods.
arXiv Detail & Related papers (2021-08-20T13:15:24Z) - Prompting Contrastive Explanations for Commonsense Reasoning Tasks [74.7346558082693]
Large pretrained language models (PLMs) can achieve near-human performance on commonsense reasoning tasks.
We show how to use these same models to generate human-interpretable evidence.
arXiv Detail & Related papers (2021-06-12T17:06:13Z) - Prediction or Comparison: Toward Interpretable Qualitative Reasoning [16.02199526395448]
Current approaches use either semantics to transform natural language inputs into logical expressions or a "black-box" model to solve them in one step.
In this work, we categorize qualitative reasoning tasks into two types: prediction and comparison.
In particular, we adopt neural network modules trained in an end-to-end manner to simulate the two reasoning processes.
arXiv Detail & Related papers (2021-06-04T10:27:55Z) - Rationalization through Concepts [27.207067974031805]
We present a novel self-interpretable model called ConRAT.
Inspired by how human explanations for high-level decisions are often based on key concepts, ConRAT infers which ones are described in the document.
Two regularizers drive ConRAT to build interpretable concepts.
arXiv Detail & Related papers (2021-05-11T07:46:48Z) - Dependency Decomposition and a Reject Option for Explainable Models [4.94950858749529]
Recent deep learning models perform extremely well in various inference tasks.
Recent advances offer methods to visualize features, describe attribution of the input.
We present the first analysis of dependencies regarding the probability distribution over the desired image classification outputs.
arXiv Detail & Related papers (2020-12-11T17:39:33Z) - Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial
Explanations of Their Behavior in Natural Language? [86.60613602337246]
We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations.
LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output.
We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
arXiv Detail & Related papers (2020-10-08T16:59:07Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.