Learning to vary: Teaching LMs to reproduce human linguistic variability in next-word prediction
- URL: http://arxiv.org/abs/2509.17794v2
- Date: Mon, 06 Oct 2025 20:22:11 GMT
- Title: Learning to vary: Teaching LMs to reproduce human linguistic variability in next-word prediction
- Authors: Tobias Groot, Salo Lacunes, Evgenia Ilia,
- Abstract summary: We investigate whether training LMs on multiple plausible word continuations per context can improve their ability to reproduce human linguistic variability for next-word prediction.<n>We employ fine-tuning techniques for pre-trained and instruction-tuned models; and demonstrate their potential when fine-tuning GPT-2 and Mistral-7B-IT.
- Score: 0.19116784879310025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language generation (NLG) tasks are often subject to inherent variability; e.g. predicting the next word given a context has multiple valid responses, evident when asking multiple humans to complete the task. While having language models (LMs) that are aligned pluralistically, so that they are able to reproduce well the inherent diversity in perspectives of an entire population of interest is clearly beneficial, Ilia and Aziz (2024) show that LMs do not reproduce this type of linguistic variability well. They speculate this inability might stem from the lack of consistent training of LMs with data reflecting this type of inherent variability. As such, we investigate whether training LMs on multiple plausible word continuations per context can improve their ability to reproduce human linguistic variability for next-word prediction. We employ fine-tuning techniques for pre-trained and instruction-tuned models; and demonstrate their potential when fine-tuning GPT-2 and Mistral-7B-IT, using Provo Corpus. Our evaluation, which measures divergence among empirically estimated human and model next-word distributions across contexts before and after fine-tuning, shows that our multi-label fine-tuning improves the LMs' ability to reproduce linguistic variability; both for contexts that admit higher and lower variability.
Related papers
- Counterfactual reasoning: an analysis of in-context emergence [49.58529868457226]
Large-scale neural language models (LMs) exhibit remarkable performance in in-context learning.<n>This work studies in-context counterfactual reasoning in language models, that is, to predict the consequences of changes under hypothetical scenarios.
arXiv Detail & Related papers (2025-06-05T16:02:07Z) - Spontaneous Speech Variables for Evaluating LLMs Cognitive Plausibility [0.7061230262755125]
We propose using spontaneous speech corpora to derive production variables (speech reductions, prosodic prominences) and applying them in a similar fashion.<n>We then test models trained with a standard procedure on different pretraining datasets for their ability to predict these two variables.<n>Our results show that, after some fine-tuning, the models can predict these production variables well above baselines.
arXiv Detail & Related papers (2025-05-22T06:23:02Z) - From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition [6.617999710257379]
We propose a three-stage framework to assess the abilities of LMs.
We evaluate the generative capacities of LMs using methods from linguistic research.
arXiv Detail & Related papers (2024-10-17T06:31:49Z) - Linguistic Calibration of Long-Form Generations [57.836339732160916]
Language models (LMs) may lead their users to make suboptimal downstream decisions when they confidently hallucinate.
This issue can be mitigated by having the LM verbally convey the probability that its claims are correct, but existing models cannot produce long-form text with calibrated confidence statements.
We define linguistic calibration for long-form generations: an LM is linguistically calibrated if its generations enable its users to make calibrated probabilistic predictions.
arXiv Detail & Related papers (2024-03-30T20:47:55Z) - Predict the Next Word: Humans exhibit uncertainty in this task and language models _____ [7.581259361859477]
Language models (LMs) are trained to assign probability to human-generated text.
We exploit this fact and evaluate the LM's ability to reproduce variability that humans exhibit in the 'next word prediction' task.
We assess GPT2, BLOOM and ChatGPT and find that they exhibit fairly low calibration to human uncertainty.
arXiv Detail & Related papers (2024-02-27T14:11:32Z) - Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension.
But to achieve these results, LMs must be trained in distinctly un-human-like ways.
Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning?
We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z) - Humans and language models diverge when predicting repeating text [52.03471802608112]
We present a scenario in which the performance of humans and LMs diverges.
Human and GPT-2 LM predictions are strongly aligned in the first presentation of a text span, but their performance quickly diverges when memory begins to play a role.
We hope that this scenario will spur future work in bringing LMs closer to human behavior.
arXiv Detail & Related papers (2023-10-10T08:24:28Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.<n>We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Evaluating Models of Robust Word Recognition with Serial Reproduction [8.17947290421835]
We compare several broad-coverage probabilistic generative language models in their ability to capture human linguistic expectations.
We find that those models that make use of abstract representations of preceding linguistic context best predict the changes made by people in the course of serial reproduction.
arXiv Detail & Related papers (2021-01-24T20:16:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.