Explaining How Transformers Use Context to Build Predictions
- URL: http://arxiv.org/abs/2305.12535v1
- Date: Sun, 21 May 2023 18:29:10 GMT
- Title: Explaining How Transformers Use Context to Build Predictions
- Authors: Javier Ferrando, Gerard I. G\'allego, Ioannis Tsiamas, Marta R.
Costa-juss\`a
- Abstract summary: Language Generation Models produce words based on the previous context.
It is still unclear how prior words affect the model's decision throughout the layers.
We leverage recent advances in explainability of the Transformer and present a procedure to analyze models for language generation.
- Score: 0.1749935196721634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language Generation Models produce words based on the previous context.
Although existing methods offer input attributions as explanations for a
model's prediction, it is still unclear how prior words affect the model's
decision throughout the layers. In this work, we leverage recent advances in
explainability of the Transformer and present a procedure to analyze models for
language generation. Using contrastive examples, we compare the alignment of
our explanations with evidence of the linguistic phenomena, and show that our
method consistently aligns better than gradient-based and perturbation-based
baselines. Then, we investigate the role of MLPs inside the Transformer and
show that they learn features that help the model predict words that are
grammatically acceptable. Lastly, we apply our method to Neural Machine
Translation models, and demonstrate that they generate human-like source-target
alignments for building predictions.
Related papers
- Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Explaining Hate Speech Classification with Model Agnostic Methods [0.9990687944474738]
The research goal of this paper is to bridge the gap between hate speech prediction and the explanations generated by the system to support its decision.
This has been achieved by first predicting the classification of a text and then providing a posthoc, model agnostic and surrogate interpretability approach.
arXiv Detail & Related papers (2023-05-30T19:52:56Z) - Token-wise Decomposition of Autoregressive Language Model Hidden States
for Analyzing Model Predictions [9.909170013118775]
This work presents a linear decomposition of final hidden states from autoregressive language models based on each initial input token.
Using the change in next-word probability as a measure of importance, this work first examines which context words make the biggest contribution to language model predictions.
arXiv Detail & Related papers (2023-05-17T23:55:32Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - Trajectory Prediction with Linguistic Representations [27.71805777845141]
We present a novel trajectory prediction model that uses linguistic intermediate representations to forecast trajectories.
The model learns the meaning of each of the words without direct per-word supervision.
It generates a linguistic description of trajectories which captures maneuvers and interactions over an extended time interval.
arXiv Detail & Related papers (2021-10-19T05:22:38Z) - On the Lack of Robust Interpretability of Neural Text Classifiers [14.685352584216757]
We assess the robustness of interpretations of neural text classifiers based on pretrained Transformer encoders.
Both tests show surprising deviations from expected behavior, raising questions about the extent of insights that practitioners may draw from interpretations.
arXiv Detail & Related papers (2021-06-08T18:31:02Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.