Interpreting Language Models with Contrastive Explanations
- URL: http://arxiv.org/abs/2202.10419v1
- Date: Mon, 21 Feb 2022 18:32:24 GMT
- Title: Interpreting Language Models with Contrastive Explanations
- Authors: Kayo Yin and Graham Neubig
- Abstract summary: Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
- Score: 99.7035899290924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model interpretability methods are often used to explain NLP model decisions
on tasks such as text classification, where the output space is relatively
small. However, when applied to language generation, where the output space
often consists of tens of thousands of tokens, these methods are unable to
provide informative explanations. Language models must consider various
features to predict a token, such as its part of speech, number, tense, or
semantics. Existing explanation methods conflate evidence for all these
features into a single explanation, which is less interpretable for human
understanding.
To disentangle the different decisions in language modeling, we focus on
explaining language models contrastively: we look for salient input tokens that
explain why the model predicted one token instead of another. We demonstrate
that contrastive explanations are quantifiably better than non-contrastive
explanations in verifying major grammatical phenomena, and that they
significantly improve contrastive model simulatability for human observers. We
also identify groups of contrastive decisions where the model uses similar
evidence, and we are able to characterize what input tokens models use during
various language generation decisions.
Related papers
- Generating Prototypes for Contradiction Detection Using Large Language
Models and Linguistic Rules [1.6497679785422956]
We introduce a novel data generation method for contradiction detection.
We instruct the generative models to create contradicting statements with respect to descriptions of specific contradiction types.
As an auxiliary approach, we use linguistic rules to construct simple contradictions.
arXiv Detail & Related papers (2023-10-23T09:07:27Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Black-box language model explanation by context length probing [7.526153863886609]
We present context length probing, a novel explanation technique for causal language models.
The technique is model-agnostic and does not rely on access to model internals beyond computing token-level probabilities.
We apply context length probing to large pre-trained language models and offer some initial analyses and insights.
arXiv Detail & Related papers (2022-12-30T16:24:10Z) - Bidirectional Representations for Low Resource Spoken Language
Understanding [39.208462511430554]
We propose a representation model to encode speech in bidirectional rich encodings.
The approach uses a masked language modelling objective to learn the representations.
We show that the performance of the resulting encodings is better than comparable models on multiple datasets.
arXiv Detail & Related papers (2022-11-24T17:05:16Z) - Interactively Generating Explanations for Transformer Language Models [14.306470205426526]
Transformer language models are state-of-the-art in a multitude of NLP tasks.
Recent methods aim to provide interpretability and explainability to black-box models.
We emphasize using prototype networks directly incorporated into the model architecture.
arXiv Detail & Related papers (2021-09-02T11:34:29Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.