LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms
- URL: http://arxiv.org/abs/2311.13133v1
- Date: Wed, 22 Nov 2023 03:37:01 GMT
- Title: LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms
- Authors: Aditi Jha, Sam Havens, Jeremey Dohmann, Alex Trott, Jacob Portes
- Abstract summary: We finetune open-source MPT-7B and MPT-30B models on instruction finetuning datasets of various sizes ranging from 1k to 60k samples.
We find that subsets of 1k-6k instruction finetuning samples are sufficient to achieve good performance on both (1) traditional NLP benchmarks and (2) model-based evaluation.
- Score: 2.249916681499244
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large Language Models are traditionally finetuned on large instruction
datasets. However recent studies suggest that small, high-quality datasets can
suffice for general purpose instruction following. This lack of consensus
surrounding finetuning best practices is in part due to rapidly diverging
approaches to LLM evaluation. In this study, we ask whether a small amount of
diverse finetuning samples can improve performance on both traditional
perplexity-based NLP benchmarks, and on open-ended, model-based evaluation. We
finetune open-source MPT-7B and MPT-30B models on instruction finetuning
datasets of various sizes ranging from 1k to 60k samples. We find that subsets
of 1k-6k instruction finetuning samples are sufficient to achieve good
performance on both (1) traditional NLP benchmarks and (2) model-based
evaluation. Finally, we show that mixing textbook-style and open-ended QA
finetuning datasets optimizes performance on both evaluation paradigms.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.