It's Not Just Size That Matters: Small Language Models Are Also Few-Shot
Learners
- URL: http://arxiv.org/abs/2009.07118v2
- Date: Mon, 12 Apr 2021 08:16:59 GMT
- Title: It's Not Just Size That Matters: Small Language Models Are Also Few-Shot
Learners
- Authors: Timo Schick, Hinrich Sch\"utze
- Abstract summary: We show that performance similar to GPT-3 can be obtained with language models that are much "greener"
We identify key factors required for successful natural language understanding with small language models.
- Score: 14.264737570114631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When scaled to hundreds of billions of parameters, pretrained language models
such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance.
However, enormous amounts of compute are required for training and applying
such big models, resulting in a large carbon footprint and making it difficult
for researchers and practitioners to use them. We show that performance similar
to GPT-3 can be obtained with language models that are much "greener" in that
their parameter count is several orders of magnitude smaller. This is achieved
by converting textual inputs into cloze questions that contain a task
description, combined with gradient-based optimization; exploiting unlabeled
data gives further improvements. We identify key factors required for
successful natural language understanding with small language models.
Related papers
- Emergent Abilities in Reduced-Scale Generative Language Models [10.51168925267033]
Large language models can solve new tasks without task-specific fine-tuning.
This ability is considered an emergent ability and is primarily seen in large language models with billions of parameters.
This study investigates if such emergent properties are strictly tied to model size or can be demonstrated by smaller models trained on reduced-scale data.
arXiv Detail & Related papers (2024-04-02T18:00:28Z) - TinyStories: How Small Can Language Models Be and Still Speak Coherent
English? [37.65216279977461]
Language models (LMs) often struggle to produce coherent and fluent text when they are small.
We introduce TinyStories, a dataset of short stories that only contain words that a typical 3 to 4-year-old usually understand.
We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models.
arXiv Detail & Related papers (2023-05-12T20:56:48Z) - Zero-Shot Learners for Natural Language Understanding via a Unified
Multiple Choice Perspective [26.41585967095811]
Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without any additional training.
Our approach converts zero-shot learning into multiple-choice tasks, avoiding problems in commonly used large-scale generative models such as FLAN.
Our approach shows state-of-the-art performance on several benchmarks and produces satisfactory results on tasks such as natural language inference and text classification.
arXiv Detail & Related papers (2022-10-16T17:24:06Z) - Bidirectional Language Models Are Also Few-shot Learners [54.37445173284831]
We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models.
We show SAP is effective on question answering and summarization.
For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models.
arXiv Detail & Related papers (2022-09-29T01:35:57Z) - Elaboration-Generating Commonsense Question Answering at Scale [77.96137534751445]
In question answering requiring common sense, language models (e.g., GPT-3) have been used to generate text expressing background knowledge.
We finetune smaller language models to generate useful intermediate context, referred to here as elaborations.
Our framework alternates between updating two language models -- an elaboration generator and an answer predictor -- allowing each to influence the other.
arXiv Detail & Related papers (2022-09-02T18:32:09Z) - GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [84.33607245023049]
We propose and develop a family of language models named GLaM (Generalist Language Model)
GLaM uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants.
It consumes only 1/3 of the energy used to train GPT-3 and requires half of the flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.
arXiv Detail & Related papers (2021-12-13T18:58:19Z) - Reframing Instructional Prompts to GPTk's Language [72.69833640335519]
We propose reframing techniques for model designers to create effective prompts for language models.
Our results show that reframing improves few-shot learning performance by 14% while reducing sample complexity.
The performance gains are particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is not feasible.
arXiv Detail & Related papers (2021-09-16T09:44:43Z) - Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems [74.8759568242933]
Task-oriented dialogue systems use four connected modules, namely, Natural Language Understanding (NLU), a Dialogue State Tracking (DST), Dialogue Policy (DP) and Natural Language Generation (NLG)
A research challenge is to learn each module with the least amount of samples given the high cost related to the data collection.
We evaluate the priming few-shot ability of language models in the NLU, DP and NLG tasks.
arXiv Detail & Related papers (2020-08-14T08:23:21Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.