Related papers: Rarely a problem? Language models exhibit inverse scaling in their predictions following few-type quantifiers

Rarely a problem? Language models exhibit inverse scaling in their predictions following few-type quantifiers

URL: http://arxiv.org/abs/2212.08700v2
Date: Fri, 26 May 2023 07:18:15 GMT
Title: Rarely a problem? Language models exhibit inverse scaling in their predictions following few-type quantifiers
Authors: James A. Michaelov, Benjamin K. Bergen
Abstract summary: We focus on 'few'-type quantifiers, as in 'few children like toys', which might pose a particular challenge for language models. We present 960 English sentence stimuli from two human neurolinguistic experiments to 22 autoregressive transformer models of differing sizes.
Score: 0.6091702876917281
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How well do language models deal with quantification? In this study, we focus on 'few'-type quantifiers, as in 'few children like toys', which might pose a particular challenge for language models because the sentence components with out the quantifier are likely to co-occur, and 'few'-type quantifiers are rare. We present 960 English sentence stimuli from two human neurolinguistic experiments to 22 autoregressive transformer models of differing sizes. Not only do all the models perform poorly on 'few'-type quantifiers, but overall the larger the model, the worse its performance. This inverse scaling is consistent with previous work suggesting that larger models increasingly reflect online rather than offline human processing, and we argue that the decreasing performance of larger models may challenge uses of language models as the basis for natural language systems.

Related papers

Negation: A Pink Elephant in the Large Language Models' Room? [2.8078480738404]
Negations are key to determining sentence meaning, making them essential for logical reasoning. We investigate how model size and language impact its ability to handle negation correctly by evaluating popular language models. Our datasets can facilitate further research and improvements of language model reasoning in multilingual settings.
arXiv Detail & Related papers (2025-03-28T13:04:41Z)
Frequency Explains the Inverse Correlation of Large Language Models' Size, Training Data Amount, and Surprisal's Fit to Reading Times [15.738530737312335]
Recent studies have shown that as Transformer-based language models become larger and are trained on very large amounts of data, the fit of their surprisal estimates to naturalistic human reading times degrades. This paper presents a series of analyses showing that word frequency is a key explanatory factor underlying these two trends. The results indicate that Transformer-based language models' surprisal estimates diverge from human-like expectations due to the superhumanly complex associations they learn for predicting rare words.
arXiv Detail & Related papers (2024-02-03T20:22:54Z)
Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z)
Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z)
Lexical Generalization Improves with Larger Models and Longer Training [42.024050065980845]
We analyze the use of lexical overlaps in natural language inference, paraphrase detection, and reading comprehension. We find that larger models are much less susceptible to adopting lexical overlaps.
arXiv Detail & Related papers (2022-10-23T09:20:11Z)
MonoByte: A Pool of Monolingual Byte-level Language Models [4.491765479948667]
We release 10 monolingual byte-level models rigorously pretrained under the same configuration. Because they are tokenizer-free, the problem of unseen token embeddings is eliminated. Experiments on QA and NLI tasks show that our monolingual models achieve competitive performance to the multilingual one.
arXiv Detail & Related papers (2022-09-22T14:32:48Z)
Emergent Abilities of Large Language Models [172.08007363384218]
We consider an ability to be emergent if it is not present in smaller models but is present in larger models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.
arXiv Detail & Related papers (2022-06-15T17:32:01Z)
Scaling Language Models: Methods, Analysis & Insights from Training Gopher [83.98181046650664]
We present an analysis of Transformer-based language model performance across a wide range of model scales. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language. We discuss the application of language models to AI safety and the mitigation of downstream harms.
arXiv Detail & Related papers (2021-12-08T19:41:47Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.