Related papers: Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

URL: http://arxiv.org/abs/2402.18334v3
Date: Wed, 11 Sep 2024 16:28:29 GMT
Title: Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
Authors: Nihal V. Nayak, Yiyang Nan, Avi Trost, Stephen H. Bach,
Abstract summary: Bonito is a model for conditional task generation that converts unannotated text into task-specific training datasets for instruction tuning. We show that Bonito significantly improves the average performance of pretrained and instruction tuned models over the de facto self supervised baseline.
Score: 9.574486521686323
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce Bonito, an open-source model for conditional task generation that converts unannotated text into task-specific training datasets for instruction tuning. We aim to enable zero-shot task adaptation of large language models on users' specialized, private data. We train Bonito by fine-tuning a pretrained large language model on a new large-scale dataset with 1.65M examples created by remixing existing instruction tuning datasets into meta-templates. The meta-templates for a dataset produce training examples where the input is the unannotated text and the task attribute and the output consists of the instruction and the response. We use Bonito to generate synthetic tasks for seven datasets from specialized domains with unannotated text across three task types -- yes-no question answering, extractive question answering, and natural language inference -- and adapt language models. We show that Bonito significantly improves the average performance of pretrained and instruction tuned models over the de facto self supervised baseline. For example, adapting Mistral-Instruct-v2 and instruction tuned variants of Mistral and Llama2 with Bonito improves the strong zero-shot performance by 22.1 F1 points whereas the next word prediction objective undoes some of the benefits of instruction tuning and reduces the average performance by 0.8 F1 points. We conduct additional experiments with Bonito to understand the effects of the domain, the size of the training set, and the choice of alternative synthetic task generators. Overall, we show that learning with synthetic instruction tuning datasets is an effective way to adapt language models to new domains. The model, dataset, and code are available at https://github.com/BatsResearch/bonito.

Related papers

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts [57.53692236201343]
We propose a Multi-Task Correction MoE, where we train the experts to become an expert'' of speech-to-text, language-to-text and vision-to-text datasets. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
arXiv Detail & Related papers (2024-11-08T20:11:24Z)
Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates [57.29125360837203]
Cookbook is a framework that generates training data consisting of simple patterns over random tokens. We find that finetuning on Cookbook-generated data is able to improve performance on its corresponding task by up to 52.7 accuracy points.
arXiv Detail & Related papers (2024-10-07T17:29:40Z)
Automatic Pruning of Fine-tuning Datasets for Transformer-based Language Models [13.340191056212692]
We propose an automatic dataset pruning method for the training set of fine-tuning tasks. Our method provides multiple subsets for use in dataset pruning. Experiments on 5 downstream tasks and 2 language models show that, on average, fine-tuning on the winning ticket subsets results in a $0.1 %$ increase in the evaluation performance of the model.
arXiv Detail & Related papers (2024-07-11T22:46:18Z)
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts [20.202031878825153]
We propose a novel dynamic data mixture for MoE instruction tuning. Inspired by MoE's token routing preference, we build dataset-level representations and then capture the subtle differences among datasets. Results on two MoE models demonstrate the effectiveness of our approach on both downstream knowledge & reasoning tasks and open-ended queries.
arXiv Detail & Related papers (2024-06-17T06:47:03Z)
Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation [92.2167864437497]
We propose Dynosaur, a dynamic growth paradigm for the automatic curation of instruction-tuning data. Based on the metadata of existing datasets, we use LLMs to automatically construct instruction-tuning data by identifying relevant data fields and generating appropriate instructions. By leveraging the existing annotated datasets, Dynosaur offers several advantages: 1) it reduces the API cost for generating instructions; 2) it provides high-quality data for instruction tuning; and 3) it supports the continuous improvement of models by generating instruction-tuning data when a new annotated dataset becomes available.
arXiv Detail & Related papers (2023-05-23T17:56:26Z)
One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization [27.27985393610581]
We find that multilingual fine-tuning leads to performance degradation on recent models UniXcoder and CodeT5. To alleviate the potentially catastrophic forgetting issue in multilingual models, we fix all pre-trained model parameters, insert the parameter-efficient structure adapter, and fine-tune it. Our experiments on three probing tasks show that adapter tuning significantly outperforms full-model fine-tuning and effectively overcomes catastrophic forgetting.
arXiv Detail & Related papers (2023-03-28T08:49:54Z)
DeepStruct: Pretraining of Language Models for Structure Prediction [64.84144849119554]
We pretrain language models on a collection of task-agnostic corpora to generate structures from text. Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks. We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets.
arXiv Detail & Related papers (2022-05-21T00:58:22Z)
Meta-learning via Language Model In-context Tuning [16.306733033119897]
The goal of meta-learning is to learn to adapt to a new task with only a few labeled examples. We propose $textitin-context tuning, which recasts adaptation and prediction. We benchmark our method on two collections of text classification tasks: LAMA and BinaryClfs.
arXiv Detail & Related papers (2021-10-15T02:29:09Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text. We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness. Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.