To model human linguistic prediction, make LLMs less superhuman
- URL: http://arxiv.org/abs/2510.05141v1
- Date: Wed, 01 Oct 2025 21:53:42 GMT
- Title: To model human linguistic prediction, make LLMs less superhuman
- Authors: Byung-Doh Oh, Tal Linzen,
- Abstract summary: In the last few years, as language models have become better at predicting the next word, their ability to predict human reading behavior has declined.<n>This is because LLMs are able to predict upcoming words much better than people can, leading them to predict lower processing difficulty in reading.<n>We advocate for creating models that have human-like long-term and short-term memory, and outline some possible directions for achieving this goal.
- Score: 22.141352033537554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When people listen to or read a sentence, they actively make predictions about upcoming words: words that are less predictable are generally read more slowly than predictable ones. The success of large language models (LLMs), which, like humans, make predictions about upcoming words, has motivated exploring the use of these models as cognitive models of human linguistic prediction. Surprisingly, in the last few years, as language models have become better at predicting the next word, their ability to predict human reading behavior has declined. This is because LLMs are able to predict upcoming words much better than people can, leading them to predict lower processing difficulty in reading than observed in human experiments; in other words, mainstream LLMs are 'superhuman' as models of language comprehension. In this position paper, we argue that LLMs' superhumanness is primarily driven by two factors: compared to humans, LLMs have much stronger long-term memory for facts and training examples, and they have much better short-term memory for previous words in the text. We advocate for creating models that have human-like long-term and short-term memory, and outline some possible directions for achieving this goal. Finally, we argue that currently available human data is insufficient to measure progress towards this goal, and outline human experiments that can address this gap.
Related papers
- On the Thinking-Language Modeling Gap in Large Language Models [68.83670974539108]
We show that there is a significant gap between the modeling of languages and thoughts.<n>We propose a new prompt technique termed Language-of-Thoughts (LoT) to demonstrate and alleviate this gap.
arXiv Detail & Related papers (2025-05-19T09:31:52Z) - Temperature-scaling surprisal estimates improve fit to human reading times -- but does it do so for the "right reasons"? [15.773775387121097]
We show that calibration of large language models typically improves with model size.
We find that temperature-scaling probabilities lead to a systematically better fit to reading times.
arXiv Detail & Related papers (2023-11-15T19:34:06Z) - Psychometric Predictive Power of Large Language Models [32.31556074470733]
We find that instruction tuning does not always make large language models human-like from a cognitive modeling perspective.
Next-word probabilities estimated by instruction-tuned LLMs are often worse at simulating human reading behavior than those estimated by base LLMs.
arXiv Detail & Related papers (2023-11-13T17:19:14Z) - Humans and language models diverge when predicting repeating text [52.03471802608112]
We present a scenario in which the performance of humans and LMs diverges.
Human and GPT-2 LM predictions are strongly aligned in the first presentation of a text span, but their performance quickly diverges when memory begins to play a role.
We hope that this scenario will spur future work in bringing LMs closer to human behavior.
arXiv Detail & Related papers (2023-10-10T08:24:28Z) - Where Would I Go Next? Large Language Models as Human Mobility
Predictors [21.100313868232995]
We introduce a novel method, LLM-Mob, which leverages the language understanding and reasoning capabilities of LLMs for analysing human mobility data.
Comprehensive evaluations of our method reveal that LLM-Mob excels in providing accurate and interpretable predictions.
arXiv Detail & Related papers (2023-08-29T10:24:23Z) - The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling
Probabilistic Social Inferences from Linguistic Inputs [50.32802502923367]
We study the process of language driving and influencing social reasoning in a probabilistic goal inference domain.
We propose a neuro-symbolic model that carries out goal inference from linguistic inputs of agent scenarios.
Our model closely matches human response patterns and better predicts human judgements than using an LLM alone.
arXiv Detail & Related papers (2023-06-25T19:38:01Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z) - Collateral facilitation in humans and language models [0.6091702876917281]
We show that humans display a similar processing advantage for highly anomalous words.
We discuss the implications for our understanding of both human language comprehension and the predictions made by language models.
arXiv Detail & Related papers (2022-11-09T21:08:08Z) - Shortcut Learning of Large Language Models in Natural Language
Understanding [119.45683008451698]
Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks.
They might rely on dataset bias and artifacts as shortcuts for prediction.
This has significantly affected their generalizability and adversarial robustness.
arXiv Detail & Related papers (2022-08-25T03:51:39Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.