Instruction Tuned Models are Quick Learners
- URL: http://arxiv.org/abs/2306.05539v1
- Date: Wed, 17 May 2023 22:30:01 GMT
- Title: Instruction Tuned Models are Quick Learners
- Authors: Himanshu Gupta and Saurabh Arjun Sawant and Swaroop Mishra and Mutsumi
Nakamura and Arindam Mitra and Santosh Mashetty and Chitta Baral
- Abstract summary: In this work, we demonstrate the sample efficiency of instruction tuned models over various tasks.
In the STL setting, instruction tuned models equipped with 25% of the downstream train data surpass the SOTA performance on the downstream tasks.
In the MTL setting, an instruction tuned model trained on only 6% of downstream training data achieve SOTA, while using 100% of the training data results in a 3.69% points improvement.
- Score: 20.771930945083994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction tuning of language models has demonstrated the ability to enhance
model generalization to unseen tasks via in-context learning using a few
examples. However, typical supervised learning still requires a plethora of
downstream training data for finetuning. Often in real-world situations, there
is a scarcity of data available for finetuning, falling somewhere between few
shot inference and fully supervised finetuning. In this work, we demonstrate
the sample efficiency of instruction tuned models over various tasks by
estimating the minimal downstream training data required by them to perform
transfer learning and match the performance of state-of-the-art (SOTA)
supervised models. We conduct experiments on 119 tasks from Super Natural
Instructions (SuperNI) in both the single task learning (STL) and multi task
learning (MTL) settings. Our findings reveal that, in the STL setting,
instruction tuned models equipped with 25% of the downstream train data surpass
the SOTA performance on the downstream tasks. In the MTL setting, an
instruction tuned model trained on only 6% of downstream training data achieve
SOTA, while using 100% of the training data results in a 3.69% points
improvement (ROUGE-L 74.68) over the previous SOTA. We conduct an analysis on
T5 vs Tk-Instruct by developing several baselines to demonstrate that
instruction tuning aids in increasing both sample efficiency and transfer
learning. Additionally, we observe a consistent ~4% performance increase in
both settings when pre-finetuning is performed with instructions. Finally, we
conduct a categorical study and find that contrary to previous results, tasks
in the question rewriting and title generation categories suffer from
instruction tuning.
Related papers
- An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training [51.622652121580394]
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features.
In this paper, we question if the textitextremely simple lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm.
Our pre-training with distillation on pure lightweight ViTs with vanilla/hierarchical design ($5.7M$/$6.5M$) can achieve $79.4%$/$78.9%$ top-1 accuracy on ImageNet-1
arXiv Detail & Related papers (2024-04-18T14:14:44Z) - Less is More: High-value Data Selection for Visual Instruction Tuning [127.38740043393527]
We propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost.
Our approach using only about 15% data can achieve comparable average performance to the full-data fine-tuned model across eight benchmarks.
arXiv Detail & Related papers (2024-03-14T16:47:25Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - Learning to Modulate pre-trained Models in RL [22.812215561012874]
Fine-tuning a pre-trained model often suffers from catastrophic forgetting.
Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly.
We propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model.
arXiv Detail & Related papers (2023-06-26T17:53:05Z) - Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models.
Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z) - Do Models Really Learn to Follow Instructions? An Empirical Study of
Instruction Tuning [37.01833561948585]
Recent works on instruction tuning (IT) have achieved great performance with zero-shot generalizability to unseen tasks.
We analyze how models utilize instructions during IT by comparing model training with altered vs. original instructions.
arXiv Detail & Related papers (2023-05-19T02:00:47Z) - LIMA: Less Is More for Alignment [112.93890201395477]
We train LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses.
LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples.
In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases.
arXiv Detail & Related papers (2023-05-18T17:45:22Z) - Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low
Training Data Instruction Tuning [13.558918552284906]
This paper focuses on reducing the data used in instruction tuning for large language models (LLMs) to decrease training costs and improve data efficiency.
The results suggest that task-specific models can be trained using less than 0.5% of the original dataset, with a 2% improvement in performance over those trained on full task-related data.
arXiv Detail & Related papers (2023-05-16T07:52:57Z) - Improving Neural Machine Translation by Denoising Training [95.96569884410137]
We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation.
We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally.
Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
arXiv Detail & Related papers (2022-01-19T00:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.