Large GPT-like Models are Bad Babies: A Closer Look at the Relationship
between Linguistic Competence and Psycholinguistic Measures
- URL: http://arxiv.org/abs/2311.04547v1
- Date: Wed, 8 Nov 2023 09:26:27 GMT
- Title: Large GPT-like Models are Bad Babies: A Closer Look at the Relationship
between Linguistic Competence and Psycholinguistic Measures
- Authors: Julius Steuer, Marius Mosbach, Dietrich Klakow
- Abstract summary: We train a series of GPT-like language models of different sizes on the strict version of the BabyLM pretraining corpus.
We find a positive correlation between LM size and performance on all three challenge tasks, with different preferences for model width and depth in each of the tasks.
This suggests that modelling processing effort and linguistic competence may require an approach different from training GPT-like LMs on a developmentally plausible corpus.
- Score: 25.210837736795565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Research on the cognitive plausibility of language models (LMs) has so far
mostly concentrated on modelling psycholinguistic response variables such as
reading times, gaze durations and N400/P600 EEG signals, while mostly leaving
out the dimension of what Mahowald et al. (2023) described as formal and
functional linguistic competence, and developmental plausibility. We address
this gap by training a series of GPT-like language models of different sizes on
the strict version of the BabyLM pretraining corpus, evaluating on the
challenge tasks (BLiMP, GLUE, MSGS) and an additional reading time prediction
task. We find a positive correlation between LM size and performance on all
three challenge tasks, with different preferences for model width and depth in
each of the tasks. In contrast, a negative correlation was found between LM
size and reading time fit of linear mixed-effects models using LM surprisal as
a predictor, with the second-smallest LM achieving the largest log-likelihood
reduction over a baseline model without surprisal. This suggests that modelling
processing effort and linguistic competence may require an approach different
from training GPT-like LMs on a developmentally plausible corpus.
Related papers
- In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - Evaluating Neural Language Models as Cognitive Models of Language
Acquisition [4.779196219827507]
We argue that some of the most prominent benchmarks for evaluating the syntactic capacities of neural language models may not be sufficiently rigorous.
When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models.
We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.
arXiv Detail & Related papers (2023-10-31T00:16:17Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Evaluating the Capability of Large-scale Language Models on Chinese
Grammatical Error Correction Task [10.597024796304016]
Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks.
This report explores the how large language models perform on Chinese grammatical error correction tasks.
arXiv Detail & Related papers (2023-07-08T13:10:59Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains.
How do language models of different sizes learn during pre-training?
Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.