Multitask Prompted Training Enables Zero-Shot Task Generalization
- URL: http://arxiv.org/abs/2110.08207v1
- Date: Fri, 15 Oct 2021 17:08:57 GMT
- Title: Multitask Prompted Training Enables Zero-Shot Task Generalization
- Authors: Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang
Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao,
Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma
Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak,
Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo
Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas
Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault
Fevry, Jason Alan Fries, Ryan Teehan, Stella Biderman, Leo Gao, Tali Bers,
Thomas Wolf, Alexander M. Rush
- Abstract summary: We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
- Score: 70.12770442071657
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models have recently been shown to attain reasonable zero-shot
generalization on a diverse set of tasks. It has been hypothesized that this is
a consequence of implicit multitask learning in language model training. Can
zero-shot generalization instead be directly induced by explicit multitask
learning? To test this question at scale, we develop a system for easily
mapping general natural language tasks into a human-readable prompted form. We
convert a large set of supervised datasets, each with multiple prompts using
varying natural language. These prompted datasets allow for benchmarking the
ability of a model to perform completely unseen tasks specified in natural
language. We fine-tune a pretrained encoder-decoder model on this multitask
mixture covering a wide variety of tasks. The model attains strong zero-shot
performance on several standard datasets, often outperforming models 16x its
size. Further, our approach attains strong performance on a subset of tasks
from the BIG-Bench benchmark, outperforming models 6x its size. All prompts and
trained models are available at github.com/bigscience-workshop/promptsource/.
Related papers
- SpeechVerse: A Large-scale Generalizable Audio Language Model [38.67969337605572]
SpeechVerse is a robust multi-task training and curriculum learning framework.
It combines pre-trained speech and text foundation models via a small set of learnable parameters.
Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
arXiv Detail & Related papers (2024-05-14T03:33:31Z) - UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions [64.50935101415776]
We build a single model that jointly performs various spoken language understanding (SLU) tasks.
We demonstrate the efficacy of our single multi-task learning model "UniverSLU" for 12 speech classification and sequence generation task types spanning 17 datasets and 9 languages.
arXiv Detail & Related papers (2023-10-04T17:10:23Z) - Crosslingual Generalization through Multitask Finetuning [80.8822603322471]
Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting.
We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0.
We find finetuning large multilingual language models on English tasks with English prompts allows for task generalization to non-English languages.
arXiv Detail & Related papers (2022-11-03T13:19:32Z) - Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple
Tasks [77.90900650816046]
We introduce $textZemi$, a zero-shot semi-parametric language model.
We train $textZemi$ with a novel semi-parametric multitask prompted training paradigm.
Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus.
arXiv Detail & Related papers (2022-10-01T04:08:50Z) - Multi Task Learning For Zero Shot Performance Prediction of Multilingual
Models [12.759281077118567]
Massively Multilingual Transformer based Language Models have been observed to be surprisingly effective on zero-shot transfer across languages.
We build upon some of the existing techniques for predicting the zero-shot performance on a task, by modeling it as a multi-task learning problem.
arXiv Detail & Related papers (2022-05-12T14:47:03Z) - XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation [80.18830380517753]
We develop a new task-agnostic distillation framework XtremeDistilTransformers.
We study the transferability of several source tasks, augmentation resources and model architecture for distillation.
arXiv Detail & Related papers (2021-06-08T17:49:33Z) - Exploring Versatile Generative Language Model Via Parameter-Efficient
Transfer Learning [70.81910984985683]
We propose an effective way to fine-tune multiple down-stream generation tasks simultaneously using a single, large pre-trained model.
The experiments on five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.
arXiv Detail & Related papers (2020-04-08T06:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.