Token-wise Decomposition of Autoregressive Language Model Hidden States
for Analyzing Model Predictions
- URL: http://arxiv.org/abs/2305.10614v2
- Date: Fri, 2 Jun 2023 22:50:07 GMT
- Title: Token-wise Decomposition of Autoregressive Language Model Hidden States
for Analyzing Model Predictions
- Authors: Byung-Doh Oh, William Schuler
- Abstract summary: This work presents a linear decomposition of final hidden states from autoregressive language models based on each initial input token.
Using the change in next-word probability as a measure of importance, this work first examines which context words make the biggest contribution to language model predictions.
- Score: 9.909170013118775
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While there is much recent interest in studying why Transformer-based large
language models make predictions the way they do, the complex computations
performed within each layer have made their behavior somewhat opaque. To
mitigate this opacity, this work presents a linear decomposition of final
hidden states from autoregressive language models based on each initial input
token, which is exact for virtually all contemporary Transformer architectures.
This decomposition allows the definition of probability distributions that
ablate the contribution of specific input tokens, which can be used to analyze
their influence on model probabilities over a sequence of upcoming words with
only one forward pass from the model. Using the change in next-word probability
as a measure of importance, this work first examines which context words make
the biggest contribution to language model predictions. Regression experiments
suggest that Transformer-based language models rely primarily on collocational
associations, followed by linguistic factors such as syntactic dependencies and
coreference relationships in making next-word predictions. Additionally,
analyses using these measures to predict syntactic dependencies and coreferent
mention spans show that collocational association and repetitions of the same
token largely explain the language models' predictions on these tasks.
Related papers
- CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models [16.436592723426305]
It is unclear whether language models produce the same value for different ways of assigning joint probabilities to word spans.
Our work introduces a novel framework, ConTestS, involving statistical tests to assess score consistency across interchangeable completion and conditioning orders.
arXiv Detail & Related papers (2024-09-30T06:24:43Z) - Slaves to the Law of Large Numbers: An Asymptotic Equipartition Property for Perplexity in Generative Language Models [0.0]
We show that the perplexity of any large text produced by a language model must converge to the average entropy of its token distributions.
This work has possible practical applications for understanding and improving AI detection" tools.
arXiv Detail & Related papers (2024-05-22T16:23:40Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Explaining Language Models' Predictions with High-Impact Concepts [11.47612457613113]
We propose a complete framework for extending concept-based interpretability methods to NLP.
We optimize for features whose existence causes the output predictions to change substantially.
Our method achieves superior results on predictive impact, usability, and faithfulness compared to the baselines.
arXiv Detail & Related papers (2023-05-03T14:48:27Z) - Probing for Incremental Parse States in Autoregressive Language Models [9.166953511173903]
Next-word predictions from autoregressive neural language models show remarkable sensitivity to syntax.
This work evaluates the extent to which this behavior arises as a result of a learned ability to maintain implicit representations of incremental syntactic structures.
arXiv Detail & Related papers (2022-11-17T18:15:31Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - On the Lack of Robust Interpretability of Neural Text Classifiers [14.685352584216757]
We assess the robustness of interpretations of neural text classifiers based on pretrained Transformer encoders.
Both tests show surprising deviations from expected behavior, raising questions about the extent of insights that practitioners may draw from interpretations.
arXiv Detail & Related papers (2021-06-08T18:31:02Z) - Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner.
We study the entropy, or uncertainty, of the model's token-level predictions.
We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.