Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language
Models
- URL: http://arxiv.org/abs/2212.10461v1
- Date: Tue, 20 Dec 2022 17:36:49 GMT
- Title: Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language
Models
- Authors: Jingjing Xu, Qingxiu Dong, Hongyi Liu and Lei Li
- Abstract summary: Go-tuning is a geometry-guided self-supervised learning method.
Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B)
- Score: 23.818751895205132
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With increasing scale, large language models demonstrate both quantitative
improvement and new qualitative capabilities, especially as zero-shot learners,
like GPT-3. However, these results rely heavily on delicate prompt design and
large computation. In this work, we explore whether the strong zero-shot
ability could be achieved at a smaller model scale without any external
supervised data. To achieve this goal, we revisit masked language modeling and
present a geometry-guided self-supervised learning method (Go-tuningfor short)
by taking a small number of task-aware self-supervised data to update language
models further. Experiments show that Go-tuning can enable T5-small (80M)
competitive zero-shot results compared with large language models, such as
T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a
multi-task model, mgo-T5 (250M). It can reach the average performance of OPT
(175B) on 9 datasets.
Related papers
- Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities.
Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z) - Emergent Abilities in Reduced-Scale Generative Language Models [10.51168925267033]
Large language models can solve new tasks without task-specific fine-tuning.
This ability is considered an emergent ability and is primarily seen in large language models with billions of parameters.
This study investigates if such emergent properties are strictly tied to model size or can be demonstrated by smaller models trained on reduced-scale data.
arXiv Detail & Related papers (2024-04-02T18:00:28Z) - Small Models are Valuable Plug-ins for Large Language Models [65.29370906766997]
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable.
We propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models.
arXiv Detail & Related papers (2023-05-15T17:59:01Z) - Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting.
The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z) - Specializing Smaller Language Models towards Multi-Step Reasoning [56.78474185485288]
We show that abilities can be distilled down from GPT-3.5 ($ge$ 175B) to T5 variants ($le$ 11B)
We propose model specialization, to specialize the model's ability towards a target task.
arXiv Detail & Related papers (2023-01-30T08:51:19Z) - Teaching Small Language Models to Reason [19.625523231233128]
Chain of thought prompting successfully improves the reasoning capabilities of large language models.
We explore the transfer of such reasoning capabilities to models with less than 100 billion parameters via knowledge distillation.
Our experiments show that the proposed method improves task performance across arithmetic, commonsense and symbolic reasoning datasets.
arXiv Detail & Related papers (2022-12-16T11:24:42Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - Sharpness-Aware Minimization Improves Language Model Generalization [46.83888240127077]
We show that Sharpness-Aware Minimization (SAM) can substantially improve the generalization of language models without much computational overhead.
We show that SAM is able to boost performance on SuperGLUE, GLUE, Web Questions, Natural Questions, Trivia QA, and TyDiQA, with particularly large gains when training data for these tasks is limited.
arXiv Detail & Related papers (2021-10-16T09:44:06Z) - Finetuned Language Models Are Zero-Shot Learners [67.70352207685558]
We show that instruction tuning boosts zero-shot performance on unseen tasks.
We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates.
We evaluate this instruction-tuned model, which we call FLAN, on unseen task types.
arXiv Detail & Related papers (2021-09-03T17:55:52Z) - ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language
Understanding and Generation [25.430130072811075]
We propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models.
It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks.
We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph.
arXiv Detail & Related papers (2021-07-05T16:54:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.