Related papers: Understanding How Model Size Affects Few-shot Instruction Prompting

Understanding How Model Size Affects Few-shot Instruction Prompting

URL: http://arxiv.org/abs/2212.01907v1
Date: Sun, 4 Dec 2022 19:59:52 GMT
Title: Understanding How Model Size Affects Few-shot Instruction Prompting
Authors: Ayrton San Joaquin and Ardy Haroen
Abstract summary: We investigate how the model size affects the model's ability to discriminate a word's meaning in a given context. We introduce a dataset called DeltaWords, which evaluates a model's ability to follow instructions. We show a weak inverse scaling trend, where task accuracy degrades as model size increase.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models are affected by the phenomena of memorizing and forgetting their training data. But how do these vary by model size? We work towards this question by investigating how the model size affects the model's ability to discriminate a word's meaning in a given context. We introduce a dataset called DeltaWords, which evaluates a model's ability to follow instructions to select a sentence which replaces the target word with its antonym. We show a weak inverse scaling trend, where task accuracy degrades as model size increase, under extremely few-shot prompting regimes. We show that increasing the number of examples tend to disproportionately benefit larger models than smaller models.

Related papers

Stands to Reason: Investigating the Effect of Reasoning on Idiomaticity Detection [2.8330244018167945]
We examine how reasoning capabilities in Large Language Models affect idiomaticity detection performance.<n>We find the effect of reasoning to be smaller and more varied than expected.<n>For smaller models, producing chain-of-thought (CoT) reasoning increases performance from Math-tuned intermediate models, but not to the levels of the base models.
arXiv Detail & Related papers (2025-08-18T21:17:09Z)
Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications. Memorisation is the causal effect of training with an instance on the model's ability to predict that instance. This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z)
Show Me How It's Done: The Role of Explanations in Fine-Tuning Language Models [0.45060992929802207]
We show the significant benefits of using fine-tuning with explanations to enhance the performance of language models. We found that even smaller language models with as few as 60 million parameters benefited substantially from this approach.
arXiv Detail & Related papers (2024-02-12T10:11:50Z)
Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z)
Rarely a problem? Language models exhibit inverse scaling in their predictions following few-type quantifiers [0.6091702876917281]
We focus on 'few'-type quantifiers, as in 'few children like toys', which might pose a particular challenge for language models. We present 960 English sentence stimuli from two human neurolinguistic experiments to 22 autoregressive transformer models of differing sizes.
arXiv Detail & Related papers (2022-12-16T20:01:22Z)
Model Extraction Attack against Self-supervised Speech Models [52.81330435990717]
Self-supervised learning (SSL) speech models generate meaningful representations of given clips. Model extraction attack (MEA) often refers to an adversary stealing the functionality of the victim model with only query access. We study the MEA problem against SSL speech model with a small number of queries.
arXiv Detail & Related papers (2022-11-29T09:28:05Z)
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models [64.22311189896888]
We study exact memorization in causal and masked language modeling, across model sizes and throughout the training process. Surprisingly, we show that larger models can memorize a larger portion of the data before over-fitting and tend to forget less throughout the training process.
arXiv Detail & Related papers (2022-05-22T07:43:50Z)
Internet-augmented language models through few-shot prompting for open-domain question answering [6.573232954655063]
We capitalize on the unique few-shot capabilities offered by large-scale language models to overcome some of their challenges. We use few-shot prompting to learn to condition language models on information returned from the web using Google Search. We find that language models conditioned on the web surpass performance of closed-book models of similar, or even larger, model sizes in open-domain question answering.
arXiv Detail & Related papers (2022-03-10T02:24:14Z)
Predicting on the Edge: Identifying Where a Larger Model Does Better [61.793778186198864]
We show that large models have the largest improvement on examples where the small model is most uncertain. We show that a switcher model which defers examples to a larger model when a small model is uncertain can achieve striking improvements in performance and resource usage.
arXiv Detail & Related papers (2022-02-15T18:53:14Z)
Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models. Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers. We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)
Deep or Simple Models for Semantic Tagging? It Depends on your Data [Experiments] [26.48209520599515]
We show that the size, the label ratio, and the label cleanliness of a dataset significantly impact the quality of semantic tagging. Simple models achieve similar tagging quality to deep models on large datasets, but the runtime of simple models is much shorter.
arXiv Detail & Related papers (2020-07-11T00:05:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.