Data-Efficient Finetuning Using Cross-Task Nearest Neighbors
- URL: http://arxiv.org/abs/2212.00196v2
- Date: Wed, 24 May 2023 22:27:47 GMT
- Title: Data-Efficient Finetuning Using Cross-Task Nearest Neighbors
- Authors: Hamish Ivison and Noah A. Smith and Hannaneh Hajishirzi and Pradeep
Dasigi
- Abstract summary: We use unlabeled target-task examples to retrieve most similar labeled examples from a pool of multitask data augmented with prompts.
Our approach of finetuning models on cross-task nearest neighbors is significantly more data-efficient.
- Score: 75.07773863013001
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Obtaining labeled data to train a model for a task of interest is often
expensive. Prior work shows training models on multitask data augmented with
task descriptions (prompts) effectively transfers knowledge to new tasks.
Towards efficiently building task-specific models, we assume access to a small
number (32-1000) of unlabeled target-task examples and use those to retrieve
the most similar labeled examples from a large pool of multitask data augmented
with prompts. Compared to the current practice of finetuning models on
uniformly sampled prompted multitask data (e.g.: FLAN, T0), our approach of
finetuning on cross-task nearest neighbors is significantly more
data-efficient. Using only 2% of the data from the P3 pool without any labeled
target-task data, our models outperform strong baselines trained on all
available data by 3-30% on 12 out of 14 datasets representing held-out tasks
including legal and scientific document QA. Similarly, models trained on
cross-task nearest neighbors from SuperNaturalInstructions, representing about
5% of the pool, obtain comparable performance to state-of-the-art models on 12
held-out tasks from that pool. Moreover, the models produced by our approach
also provide a better initialization than single multitask finetuned models for
few-shot finetuning on target-task data, as shown by a 2-23% relative
improvement over few-shot finetuned T0-3B models on 8 datasets.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Better Synthetic Data by Retrieving and Transforming Existing Datasets [63.875064274379824]
We introduce DataTune, a method to make better use of publicly available datasets to improve automatic dataset generation.
On a diverse set of language-based tasks, we find that finetuning language models via DataTune improves over a few-shot prompting baseline by 49%.
We find that dataset transformation significantly increases the diversity and difficulty of generated data on many tasks.
arXiv Detail & Related papers (2024-04-22T17:15:32Z) - GistScore: Learning Better Representations for In-Context Example
Selection with Gist Bottlenecks [3.9638110494107095]
In-context Learning (ICL) is the ability of Large Language Models (LLMs) to perform new tasks when conditioned on prompts.
We propose Example Gisting, a novel approach for training example encoders through supervised fine-tuning.
We show that our fine-tuned models get state-of-the-art ICL performance with over 20% absolute gain over off-the-shelf retrievers.
arXiv Detail & Related papers (2023-11-16T06:28:05Z) - Foundation Model is Efficient Multimodal Multitask Model Selector [47.017463595702274]
A brute-force approach is to finetune all models on all target datasets, bringing high computational costs.
We propose an efficient multi-task model selector (EMMS) to transform diverse label formats into a unified noisy label embedding.
EMMS is fast, effective, and generic enough to assess the transferability of pre-trained models, making it the first model selection method in the multi-task scenario.
arXiv Detail & Related papers (2023-08-11T17:54:44Z) - Text Alignment Is An Efficient Unified Model for Massive NLP Tasks [24.069447197357164]
Next-word prediction is often not an efficient formulation for many NLP tasks.
We propose text alignment as an efficient unified model for a wide range of crucial tasks.
Our model delivers on par or even superior performance with much smaller model sizes.
arXiv Detail & Related papers (2023-07-06T02:28:31Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Label-Efficient Multi-Task Segmentation using Contrastive Learning [0.966840768820136]
We propose a multi-task segmentation model with a contrastive learning based subtask and compare its performance with other multi-task models.
We experimentally show that our proposed method outperforms other multi-task methods including the state-of-the-art fully supervised model when the amount of annotated data is limited.
arXiv Detail & Related papers (2020-09-23T14:12:17Z) - Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning
in NLP Using Fewer Parameters & Less Data [5.689320790746046]
Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks.
However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer.
We propose a novel Transformer architecture consisting of a new conditional attention mechanism and a set of task-conditioned modules.
arXiv Detail & Related papers (2020-09-19T02:04:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.