What to Pre-Train on? Efficient Intermediate Task Selection
- URL: http://arxiv.org/abs/2104.08247v1
- Date: Fri, 16 Apr 2021 17:31:18 GMT
- Title: What to Pre-Train on? Efficient Intermediate Task Selection
- Authors: Clifton Poth, Jonas Pfeiffer, Andreas R\"uckl\'e and Iryna Gurevych
- Abstract summary: Intermediate task fine-tuning has been shown to culminate in large transfer gains across many NLP tasks.
In this work we first establish that similar sequential fine-tuning gains can be achieved in adapter settings.
We then consolidate previously proposed methods that efficiently identify beneficial tasks for intermediate transfer learning.
- Score: 46.15624815492324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intermediate task fine-tuning has been shown to culminate in large transfer
gains across many NLP tasks. With an abundance of candidate datasets as well as
pre-trained language models, it has become infeasible to run the cross-product
of all combinations to find the best transfer setting. In this work we first
establish that similar sequential fine-tuning gains can be achieved in adapter
settings, and subsequently consolidate previously proposed methods that
efficiently identify beneficial tasks for intermediate transfer learning. We
experiment with a diverse set of 42 intermediate and 11 target English
classification, multiple choice, question answering, and sequence tagging
tasks. Our results show that efficient embedding based methods that rely solely
on the respective datasets outperform computational expensive few-shot
fine-tuning approaches. Our best methods achieve an average Regret@3 of less
than 1% across all target tasks, demonstrating that we are able to efficiently
identify the best datasets for intermediate training.
Related papers
- Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning [5.119396962985841]
Intermediate task transfer learning can greatly improve model performance.
We conduct the largest study on NLP task transferability and task selection with 12k source-target pairs.
Applying ESMs on a prior method reduces execution time and disk space usage by factors of 10 and 278, respectively.
arXiv Detail & Related papers (2024-10-19T16:22:04Z) - Scalable Fine-tuning from Multiple Data Sources: A First-Order Approximation Approach [17.79010397902909]
We study the problem of fine-tuning a language model (LM) for a target task by optimally using the information from $n$ auxiliary tasks.
This problem has broad applications in NLP, such as targeted instruction tuning and data selection in chain-of-thought fine-tuning.
We introduce a new algorithm to estimate model fine-tuning performances without repeated training.
arXiv Detail & Related papers (2024-09-28T21:26:50Z) - Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning [21.652389166495407]
We show that the transfer performance exhibits severe variance across different source tasks and training seeds.
Compared to embedding-free methods and text embeddings, task embeddings constructed from fine-tuned weights can better estimate task transferability.
We introduce a novel method that measures pairwise token similarity using maximum inner product search, leading to the highest performance in task prediction.
arXiv Detail & Related papers (2024-07-23T07:31:43Z) - TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data [29.45013725650798]
It is essential to extract a subset of instruction datasets that achieves comparable performance to the full dataset.
We propose Task-Agnostic Gradient Clustered COreset Selection (TAGCOS)
Specifically, we leverage sample gradients as the data representations, perform clustering to group similar data, and apply an efficient greedy algorithm for coreset selection.
arXiv Detail & Related papers (2024-07-21T17:59:20Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - Selecting task with optimal transport self-supervised learning for
few-shot classification [15.088213168796772]
Few-Shot classification aims at solving problems that only a few samples are available in the training process.
We propose a novel task selecting algorithm, named Optimal Transport Task Selecting (OTTS), to construct a training set by selecting similar tasks for Few-Shot learning.
OTTS measures the task similarity by calculating the optimal transport distance and completes the model training via a self-supervised strategy.
arXiv Detail & Related papers (2022-04-01T08:45:29Z) - Identifying Suitable Tasks for Inductive Transfer Through the Analysis
of Feature Attributions [78.55044112903148]
We use explainability techniques to predict whether task pairs will be complementary, through comparison of neural network activation between single-task models.
Our results show that, through this approach, it is possible to reduce training time by up to 83.5% at a cost of only 0.034 reduction in positive-class F1 on the TREC-IS 2020-A dataset.
arXiv Detail & Related papers (2022-02-02T15:51:07Z) - Efficient Conditional Pre-training for Transfer Learning [71.01129334495553]
We propose efficient filtering methods to select relevant subsets from the pre-training dataset.
We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings.
We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
arXiv Detail & Related papers (2020-11-20T06:16:15Z) - Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems.
Our results show that transfer learning is more beneficial than previously thought.
We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.