The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with
Transformer Encoders
- URL: http://arxiv.org/abs/2109.06939v1
- Date: Tue, 14 Sep 2021 19:32:11 GMT
- Title: The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with
Transformer Encoders
- Authors: Han He and Jinho D. Choi
- Abstract summary: Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks.
We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL.
We then conduct an extensive pruning analysis to show that a certain set of attention heads get claimed by most tasks during MTL.
- Score: 17.74208462902158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task learning with transformer encoders (MTL) has emerged as a powerful
technique to improve performance on closely-related tasks for both accuracy and
efficiency while a question still remains whether or not it would perform as
well on tasks that are distinct in nature. We first present MTL results on five
NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over
single-task learning. We then conduct an extensive pruning analysis to show
that a certain set of attention heads get claimed by most tasks during MTL, who
interfere with one another to fine-tune those heads for their own objectives.
Based on this finding, we propose the Stem Cell Hypothesis to reveal the
existence of attention heads naturally talented for many tasks that cannot be
jointly trained to create adequate embeddings for all of those tasks. Finally,
we design novel parameter-free probes to justify our hypothesis and demonstrate
how attention heads are transformed across the five tasks during MTL through
label analysis.
Related papers
- Equitable Multi-task Learning [18.65048321820911]
Multi-task learning (MTL) has achieved great success in various research domains, such as CV, NLP and IR.
We propose a novel multi-task optimization method, named EMTL, to achieve equitable MTL.
Our method stably outperforms state-of-the-art methods on the public benchmark datasets of two different research domains.
arXiv Detail & Related papers (2023-06-15T03:37:23Z) - Pre-training Multi-task Contrastive Learning Models for Scientific
Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks.
We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z) - Task-Agnostic Continual Reinforcement Learning: Gaining Insights and
Overcoming Challenges [27.474011433615317]
Continual learning (CL) enables the development of models and agents that learn from a sequence of tasks.
We investigate the factors that contribute to the performance differences between task-agnostic CL and multi-task (MTL) agents.
arXiv Detail & Related papers (2022-05-28T17:59:00Z) - ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning [56.54359715403561]
This paper introduces ExMix, a massive collection of 107 supervised NLP tasks across diverse domains and task-families.
Using ExMix, we study the effect of multi-task pre-training at the largest scale to date, and analyze co-training transfer amongst common families of tasks.
We propose ExT5, a model pre-trained using a multi-task objective of self-supervised span denoising and supervised ExMix.
arXiv Detail & Related papers (2021-11-22T02:34:46Z) - Variational Multi-Task Learning with Gumbel-Softmax Priors [105.22406384964144]
Multi-task learning aims to explore task relatedness to improve individual tasks.
We propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks.
arXiv Detail & Related papers (2021-11-09T18:49:45Z) - Towards More Generalizable One-shot Visual Imitation Learning [81.09074706236858]
A general-purpose robot should be able to master a wide range of tasks and quickly learn a novel one by leveraging past experiences.
One-shot imitation learning (OSIL) approaches this goal by training an agent with (pairs of) expert demonstrations.
We push for a higher level of generalization ability by investigating a more ambitious multi-task setup.
arXiv Detail & Related papers (2021-10-26T05:49:46Z) - Distribution Matching for Heterogeneous Multi-Task Learning: a
Large-scale Face Study [75.42182503265056]
Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm.
We deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems.
We build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks.
arXiv Detail & Related papers (2021-05-08T22:26:52Z) - Task Uncertainty Loss Reduce Negative Transfer in Asymmetric Multi-task
Feature Learning [0.0]
Multi-task learning (MTL) can improve task performance overall relative to single-task learning (STL), but can hide negative transfer (NT)
Asymmetric multitask feature learning (AMTFL) is an approach that tries to address this by allowing tasks with higher loss values to have smaller influence on feature representations for learning other tasks.
We present examples of NT in two datasets (image recognition and pharmacogenomics) and tackle this challenge by using aleatoric homoscedastic uncertainty to capture the relative confidence between tasks, and set weights for task loss.
arXiv Detail & Related papers (2020-12-17T13:30:45Z) - A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading
Comprehension [9.446041739364135]
We propose a pairwise probe to understand BERT fine-tuning on the machine reading comprehension (MRC) task.
According to pairwise probing tasks, we compare the performance of each layer's hidden representation of pre-trained and fine-tuned BERT.
Our experimental analysis leads to highly confident conclusions.
arXiv Detail & Related papers (2020-06-02T02:12:19Z) - Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems.
Our results show that transfer learning is more beneficial than previously thought.
We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.