Divergence-Based Domain Transferability for Zero-Shot Classification
- URL: http://arxiv.org/abs/2302.05735v1
- Date: Sat, 11 Feb 2023 16:04:38 GMT
- Title: Divergence-Based Domain Transferability for Zero-Shot Classification
- Authors: Alexander Pugantsov, Richard McCreadie
- Abstract summary: Transferring learned patterns from pretrained neural language models has been shown to significantly improve effectiveness across a variety of language-based tasks.
Further tuning on intermediate tasks has been demonstrated to provide additional performance benefits, provided the intermediate task is sufficiently related to the target task.
However, how to identify related tasks is an open problem, and brute-force searching effective task combinations is prohibitively expensive.
- Score: 78.55044112903148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transferring learned patterns from pretrained neural language models has been
shown to significantly improve effectiveness across a variety of language-based
tasks, meanwhile further tuning on intermediate tasks has been demonstrated to
provide additional performance benefits, provided the intermediate task is
sufficiently related to the target task. However, how to identify related tasks
is an open problem, and brute-force searching effective task combinations is
prohibitively expensive. Hence, the question arises, are we able to improve the
effectiveness and efficiency of tasks with no training examples through
selective fine-tuning? In this paper, we explore statistical measures that
approximate the divergence between domain representations as a means to
estimate whether tuning using one task pair will exhibit performance benefits
over tuning another. This estimation can then be used to reduce the number of
task pairs that need to be tested by eliminating pairs that are unlikely to
provide benefits. Through experimentation over 58 tasks and over 6,600 task
pair combinations, we demonstrate that statistical measures can distinguish
effective task pairs, and the resulting estimates can reduce end-to-end runtime
by up to 40%.
Related papers
- Does the Order of Fine-tuning Matter and Why? [11.975836356680855]
We study the effect of fine-tuning multiple intermediate tasks and their ordering on target task performance.
Experimental results show that there is an impact of task ordering on target task performance by up to 6% of performance gain and up to 4% of performance loss.
arXiv Detail & Related papers (2024-10-03T19:07:14Z) - Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning [21.652389166495407]
We show that the transfer performance exhibits severe variance across different source tasks and training seeds.
Compared to embedding-free methods and text embeddings, task embeddings constructed from fine-tuned weights can better estimate task transferability.
We introduce a novel method that measures pairwise token similarity using maximum inner product search, leading to the highest performance in task prediction.
arXiv Detail & Related papers (2024-07-23T07:31:43Z) - Localizing Task Information for Improved Model Merging and Compression [61.16012721460561]
We show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights.
We propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches.
arXiv Detail & Related papers (2024-05-13T14:54:37Z) - Efficiently Tuned Parameters are Task Embeddings [26.587153525003636]
Intermediate-task transfer can benefit a wide range of NLP tasks with properly selected source datasets.
It is computationally infeasible to experiment with all intermediate transfer combinations.
We propose to exploit these efficiently tuned parameters as off-the-shelf task embeddings.
arXiv Detail & Related papers (2022-10-21T03:19:54Z) - Composite Learning for Robust and Effective Dense Predictions [81.2055761433725]
Multi-task learning promises better model generalization on a target task by jointly optimizing it with an auxiliary task.
We find that jointly training a dense prediction (target) task with a self-supervised (auxiliary) task can consistently improve the performance of the target task, while eliminating the need for labeling auxiliary tasks.
arXiv Detail & Related papers (2022-10-13T17:59:16Z) - Identifying Suitable Tasks for Inductive Transfer Through the Analysis
of Feature Attributions [78.55044112903148]
We use explainability techniques to predict whether task pairs will be complementary, through comparison of neural network activation between single-task models.
Our results show that, through this approach, it is possible to reduce training time by up to 83.5% at a cost of only 0.034 reduction in positive-class F1 on the TREC-IS 2020-A dataset.
arXiv Detail & Related papers (2022-02-02T15:51:07Z) - Weighted Training for Cross-Task Learning [71.94908559469475]
We introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning.
We show that TAWT is easy to implement, is computationally efficient, requires little hyper parameter tuning, and enjoys non-asymptotic learning-theoretic guarantees.
As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning.
arXiv Detail & Related papers (2021-05-28T20:27:02Z) - Task Uncertainty Loss Reduce Negative Transfer in Asymmetric Multi-task
Feature Learning [0.0]
Multi-task learning (MTL) can improve task performance overall relative to single-task learning (STL), but can hide negative transfer (NT)
Asymmetric multitask feature learning (AMTFL) is an approach that tries to address this by allowing tasks with higher loss values to have smaller influence on feature representations for learning other tasks.
We present examples of NT in two datasets (image recognition and pharmacogenomics) and tackle this challenge by using aleatoric homoscedastic uncertainty to capture the relative confidence between tasks, and set weights for task loss.
arXiv Detail & Related papers (2020-12-17T13:30:45Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.