Probabilistic Predictions of People Perusing: Evaluating Metrics of
Language Model Performance for Psycholinguistic Modeling
- URL: http://arxiv.org/abs/2009.03954v1
- Date: Tue, 8 Sep 2020 19:12:06 GMT
- Title: Probabilistic Predictions of People Perusing: Evaluating Metrics of
Language Model Performance for Psycholinguistic Modeling
- Authors: Yiding Hao, Simon Mendelsohn, Rachel Sterneck, Randi Martinez, Robert
Frank
- Abstract summary: We re-evaluate a claim due to Goodkind and Bicknell that a language model's ability to model reading times is a linear function of its perplexity.
We show that the proposed relation does not always hold for Long Short-Term Memory networks, Transformers, and pre-trained models.
- Score: 0.8668211481067458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By positing a relationship between naturalistic reading times and
information-theoretic surprisal, surprisal theory (Hale, 2001; Levy, 2008)
provides a natural interface between language models and psycholinguistic
models. This paper re-evaluates a claim due to Goodkind and Bicknell (2018)
that a language model's ability to model reading times is a linear function of
its perplexity. By extending Goodkind and Bicknell's analysis to modern neural
architectures, we show that the proposed relation does not always hold for Long
Short-Term Memory networks, Transformers, and pre-trained models. We introduce
an alternate measure of language modeling performance called predictability
norm correlation based on Cloze probabilities measured from human subjects. Our
new metric yields a more robust relationship between language model quality and
psycholinguistic modeling performance that allows for comparison between models
with different training configurations.
Related papers
- Reverse-Engineering the Reader [43.26660964074272]
We introduce a novel alignment technique in which we fine-tune a language model to implicitly optimize the parameters of a linear regressor.
Using words as a test case, we evaluate our technique across multiple model sizes and datasets.
We find an inverse relationship between psychometric power and a model's performance on downstream NLP tasks as well as its perplexity on held-out test data.
arXiv Detail & Related papers (2024-10-16T23:05:01Z) - A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
We show that when sampling corpora from an aligned language model, there exists a trade-off between the strings' average reward and average log-likelihood.
We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z) - Transformer-Based Language Model Surprisal Predicts Human Reading Times
Best with About Two Billion Training Tokens [17.80735287413141]
We evaluate surprisal estimates from Transformer-based language model variants on their ability to predict human reading times.
Results show that surprisal estimates from most variants with contemporary model capacities provide the best fit after seeing about two billion training tokens.
Newly-trained smaller model variants reveal a 'tipping point' at convergence, after which the decrease in language model perplexity begins to result in poorer fits to human reading times.
arXiv Detail & Related papers (2023-04-22T12:50:49Z) - Black-box language model explanation by context length probing [7.526153863886609]
We present context length probing, a novel explanation technique for causal language models.
The technique is model-agnostic and does not rely on access to model internals beyond computing token-level probabilities.
We apply context length probing to large pre-trained language models and offer some initial analyses and insights.
arXiv Detail & Related papers (2022-12-30T16:24:10Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Scaling Language Models: Methods, Analysis & Insights from Training
Gopher [83.98181046650664]
We present an analysis of Transformer-based language model performance across a wide range of model scales.
Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language.
We discuss the application of language models to AI safety and the mitigation of downstream harms.
arXiv Detail & Related papers (2021-12-08T19:41:47Z) - Language Model Evaluation Beyond Perplexity [47.268323020210175]
We analyze whether text generated from language models exhibits the statistical tendencies present in the human-generated text on which they were trained.
We find that neural language models appear to learn only a subset of the tendencies considered, but align much more closely with empirical trends than proposed theoretical distributions.
arXiv Detail & Related papers (2021-05-31T20:13:44Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - On the Predictive Power of Neural Language Models for Human Real-Time
Comprehension Behavior [29.260666424382446]
We test over two dozen models on how well their next-word expectations predict human reading time on naturalistic text corpora.
We evaluate how features of these models determine their psychometric predictive power, or ability to predict human reading behavior.
For any given perplexity, deep Transformer models and n-gram models show superior psychometric predictive power over LSTM or structurally supervised neural models.
arXiv Detail & Related papers (2020-06-02T19:47:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.