The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
- URL: http://arxiv.org/abs/2301.13688v1
- Date: Tue, 31 Jan 2023 15:03:44 GMT
- Title: The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
- Authors: Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay,
Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts
- Abstract summary: We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022.
We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning.
To accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available.
- Score: 118.70716915295091
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the design decisions of publicly available instruction tuning
methods, and break down the development of Flan 2022 (Chung et al., 2022).
Through careful ablation studies on the Flan Collection of tasks and methods,
we tease apart the effect of design decisions which enable Flan-T5 to
outperform prior work by 3-17%+ across evaluation settings. We find task
balancing and enrichment techniques are overlooked but critical to effective
instruction tuning, and in particular, training with mixed prompt settings
(zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+)
performance in all settings. In further experiments, we show Flan-T5 requires
less finetuning to converge higher and faster than T5 on single downstream
tasks, motivating instruction-tuned models as more computationally-efficient
starting checkpoints for new tasks. Finally, to accelerate research on
instruction tuning, we make the Flan 2022 collection of datasets, templates,
and methods publicly available at
https://github.com/google-research/FLAN/tree/main/flan/v2.
Related papers
- Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training.
Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes.
Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - The CoT Collection: Improving Zero-shot and Few-shot Learning of
Language Models via Chain-of-Thought Fine-Tuning [50.75534397373867]
Language models (LMs) with less than 100B parameters are known to perform poorly on chain-of-thought (CoT) reasoning.
In this work, we aim to equip smaller LMs with the step-by-step reasoning capability by instruction tuning with CoT rationales.
arXiv Detail & Related papers (2023-05-23T13:14:59Z) - Instruction Tuned Models are Quick Learners [20.771930945083994]
In this work, we demonstrate the sample efficiency of instruction tuned models over various tasks.
In the STL setting, instruction tuned models equipped with 25% of the downstream train data surpass the SOTA performance on the downstream tasks.
In the MTL setting, an instruction tuned model trained on only 6% of downstream training data achieve SOTA, while using 100% of the training data results in a 3.69% points improvement.
arXiv Detail & Related papers (2023-05-17T22:30:01Z) - Scaling Instruction-Finetuned Language Models [126.4789306516927]
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance.
We find that instruction finetuning dramatically improves performance on a variety of model classes.
arXiv Detail & Related papers (2022-10-20T16:58:32Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.