Beware the Rationalization Trap! When Language Model Explainability
Diverges from our Mental Models of Language
- URL: http://arxiv.org/abs/2207.06897v1
- Date: Thu, 14 Jul 2022 13:26:03 GMT
- Title: Beware the Rationalization Trap! When Language Model Explainability
Diverges from our Mental Models of Language
- Authors: Rita Sevastjanova and Mennatallah El-Assady
- Abstract summary: Language models learn and represent language differently than humans; they learn the form and not the meaning.
To assess the success of language model explainability, we need to consider the impact of its divergence from a user's mental model of language.
- Score: 9.501243481182351
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Language models learn and represent language differently than humans; they
learn the form and not the meaning. Thus, to assess the success of language
model explainability, we need to consider the impact of its divergence from a
user's mental model of language. In this position paper, we argue that in order
to avoid harmful rationalization and achieve truthful understanding of language
models, explanation processes must satisfy three main conditions: (1)
explanations have to truthfully represent the model behavior, i.e., have a high
fidelity; (2) explanations must be complete, as missing information distorts
the truth; and (3) explanations have to take the user's mental model into
account, progressively verifying a person's knowledge and adapting their
understanding. We introduce a decision tree model to showcase potential reasons
why current explanations fail to reach their objectives. We further emphasize
the need for human-centered design to explain the model from multiple
perspectives, progressively adapting explanations to changing user
expectations.
Related papers
- Proceedings of the First International Workshop on Next-Generation Language Models for Knowledge Representation and Reasoning (NeLaMKRR 2024) [16.282850445579857]
Reasoning is an essential component of human intelligence as it plays a fundamental role in our ability to think critically.
Recent leap forward in natural language processing, with the emergence of language models based on transformers, is hinting at the possibility that these models exhibit reasoning abilities.
Despite ongoing discussions about what reasoning is in language models, it is still not easy to pin down to what extent these models are actually capable of reasoning.
arXiv Detail & Related papers (2024-10-07T02:31:47Z) - Conceptual and Unbiased Reasoning in Language Models [98.90677711523645]
We propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions.
We show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks.
We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making.
arXiv Detail & Related papers (2024-03-30T00:53:53Z) - ALERT: Adapting Language Models to Reasoning Tasks [43.8679673685468]
ALERT is a benchmark and suite of analyses for assessing language models' reasoning ability.
ALERT provides a test bed to asses any language model on fine-grained reasoning skills.
We find that language models learn more reasoning skills during finetuning stage compared to pretraining state.
arXiv Detail & Related papers (2022-12-16T05:15:41Z) - A fine-grained comparison of pragmatic language understanding in humans
and language models [2.231167375820083]
We compare language models and humans on seven pragmatic phenomena.
We find that the largest models achieve high accuracy and match human error patterns.
Preliminary evidence that models and humans are sensitive to similar linguistic cues.
arXiv Detail & Related papers (2022-12-13T18:34:59Z) - MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure [129.8481568648651]
We propose a benchmark to investigate models' logical reasoning capabilities in complex real-life scenarios.
Based on the multi-hop chain of reasoning, the explanation form includes three main components.
We evaluate the current best models' performance on this new explanation form.
arXiv Detail & Related papers (2022-10-22T16:01:13Z) - Language Models Understand Us, Poorly [0.0]
I investigate three views of human language understanding: as-mapping, as-reliability and as-representation.
I argue that while behavioral reliability is necessary for understanding, internal representations are sufficient.
We need work which probes model internals, adds more of human language, and measures what models can learn.
arXiv Detail & Related papers (2022-10-19T15:58:59Z) - Learning to Scaffold: Optimizing Model Explanations for Teaching [74.25464914078826]
We train models on three natural language processing and computer vision tasks.
We find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods.
arXiv Detail & Related papers (2022-04-22T16:43:39Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Social Commonsense Reasoning with Multi-Head Knowledge Attention [24.70946979449572]
Social Commonsense Reasoning requires understanding of text, knowledge about social events and their pragmatic implications, as well as commonsense reasoning skills.
We propose a novel multi-head knowledge attention model that encodes semi-structured commonsense inference rules and learns to incorporate them in a transformer-based reasoning cell.
arXiv Detail & Related papers (2020-10-12T10:24:40Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.