Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple
Tasks
- URL: http://arxiv.org/abs/2210.00185v2
- Date: Tue, 23 May 2023 00:49:44 GMT
- Title: Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple
Tasks
- Authors: Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, Heng Ji
- Abstract summary: We introduce $textZemi$, a zero-shot semi-parametric language model.
We train $textZemi$ with a novel semi-parametric multitask prompted training paradigm.
Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus.
- Score: 77.90900650816046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although large language models have achieved impressive zero-shot ability,
the huge model size generally incurs high cost. Recently, semi-parametric
language models, which augment a smaller language model with an external
retriever, have demonstrated promising language modeling capabilities. However,
it remains unclear whether such semi-parametric language models can perform
competitively well as their fully-parametric counterparts on zero-shot
generalization to downstream tasks. In this work, we introduce $\text{Zemi}$, a
zero-shot semi-parametric language model. To our best knowledge, this is the
first semi-parametric language model that can demonstrate strong zero-shot
performance on a wide range of held-out unseen tasks. We train $\text{Zemi}$
with a novel semi-parametric multitask prompted training paradigm, which shows
significant improvement compared with the parametric multitask training as
proposed by T0. Specifically, we augment the multitask training and zero-shot
evaluation with retrieval from a large-scale task-agnostic unlabeled corpus. In
order to incorporate multiple potentially noisy retrieved augmentations, we
further propose a novel $\text{augmentation fusion}$ module leveraging
perceiver resampler and gated cross-attention. Notably, our proposed
$\text{Zemi}_\text{LARGE}$ outperforms T0-3B by 16% on all seven evaluation
tasks while being 3.9x smaller in model size.
Related papers
- Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences.
We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z) - Contrastive Alignment of Vision to Language Through Parameter-Efficient
Transfer Learning [60.26952378997713]
Contrastive vision-language models (e.g. CLIP) are created by updating all the parameters of a vision model and language model through contrastive training.
We show that a minimal set of parameter updates ($$7%) can achieve the same performance as full-model training.
We describe a series of experiments: we show that existing knowledge is conserved more strongly in parameter-efficient training.
arXiv Detail & Related papers (2023-03-21T14:12:08Z) - Lego-MT: Learning Detachable Models for Massively Multilingual Machine
Translation [48.37939354609931]
We propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.
Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU.
The proposed training recipe brings a 28.2$times$ speedup over the conventional multi-way training method.
arXiv Detail & Related papers (2022-12-20T18:54:08Z) - Zero-Shot Learners for Natural Language Understanding via a Unified
Multiple Choice Perspective [26.41585967095811]
Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without any additional training.
Our approach converts zero-shot learning into multiple-choice tasks, avoiding problems in commonly used large-scale generative models such as FLAN.
Our approach shows state-of-the-art performance on several benchmarks and produces satisfactory results on tasks such as natural language inference and text classification.
arXiv Detail & Related papers (2022-10-16T17:24:06Z) - Multimodal Knowledge Alignment with Reinforcement Learning [103.68816413817372]
ESPER extends language-only zero-shot models to unseen multimodal tasks, like image and audio captioning.
Our key novelty is to use reinforcement learning to align multimodal inputs to language model generations without direct supervision.
Experiments demonstrate that ESPER outperforms baselines and prior work on a variety of zero-shot tasks.
arXiv Detail & Related papers (2022-05-25T10:12:17Z) - Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.