Eliciting Transferability in Multi-task Learning with Task-level
Mixture-of-Experts
- URL: http://arxiv.org/abs/2205.12701v1
- Date: Wed, 25 May 2022 11:59:05 GMT
- Title: Eliciting Transferability in Multi-task Learning with Task-level
Mixture-of-Experts
- Authors: Qinyuan Ye, Juan Zha, Xiang Ren
- Abstract summary: transformer models are capable of multi-task learning on diverse NLP tasks.
Humans tackle tasks in a more flexible way, by making proper presumptions on what skills and knowledge are relevant.
We show that the learned routing decisions and experts partially rediscover human categorization of NLP tasks.
- Score: 29.34065746373841
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work suggests that transformer models are capable of multi-task
learning on diverse NLP tasks. However, the potential of these models may be
limited as they use the same set of parameters for all tasks. In contrast,
humans tackle tasks in a more flexible way, by making proper presumptions on
what skills and knowledge are relevant and executing only the necessary
computations. Inspired by this, we propose to use task-level mixture-of-expert
models, which has a collection of transformer layers (i.e., experts) and a
router component to choose among these experts dynamically and flexibly. We
show that the learned routing decisions and experts partially rediscover human
categorization of NLP tasks -- certain experts are strongly associated with
extractive tasks, some with classification tasks, and some with tasks requiring
world knowledge.
Related papers
- Harder Tasks Need More Experts: Dynamic Routing in MoE Models [58.18526590138739]
We introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models.
Our method dynamically selects experts based on the confidence level in expert selection for each input.
arXiv Detail & Related papers (2024-03-12T13:41:15Z) - PEMT: Multi-Task Correlation Guided Mixture-of-Experts Enables Parameter-Efficient Transfer Learning [28.353530290015794]
We propose PEMT, a novel parameter-efficient fine-tuning framework based on multi-task transfer learning.
We conduct experiments on a broad range of tasks over 17 datasets.
arXiv Detail & Related papers (2024-02-23T03:59:18Z) - Customizable Combination of Parameter-Efficient Modules for Multi-Task
Learning [11.260650180067278]
We introduce a novel approach that combines task-common skills and task-specific skills.
A skill assignment matrix is jointly learned.
Our findings demonstrate that C-Poly outperforms fully-shared, task-specific, and skill-indistinguishable baselines.
arXiv Detail & Related papers (2023-12-06T02:47:56Z) - Pre-training Multi-task Contrastive Learning Models for Scientific
Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks.
We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z) - Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners [74.92558307689265]
We propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad')
We optimize this matching process during the training of a single model.
Experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
arXiv Detail & Related papers (2022-12-15T18:59:52Z) - EEML: Ensemble Embedded Meta-learning [5.9514420658483935]
We propose an ensemble embedded meta-learning algorithm (EEML) that explicitly utilizes multi-model-ensemble to organize prior knowledge into diverse specific experts.
We rely on a task embedding cluster mechanism to deliver diverse tasks to matching experts in training process and instruct how experts collaborate in test phase.
The experimental results show that the proposed method outperforms recent state-of-the-arts easily in few-shot learning problem.
arXiv Detail & Related papers (2022-06-18T12:37:17Z) - Modular Adaptive Policy Selection for Multi-Task Imitation Learning
through Task Division [60.232542918414985]
Multi-task learning often suffers from negative transfer, sharing information that should be task-specific.
This is done by using proto-policies as modules to divide the tasks into simple sub-behaviours that can be shared.
We also demonstrate its ability to autonomously divide the tasks into both shared and task-specific sub-behaviours.
arXiv Detail & Related papers (2022-03-28T15:53:17Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - Variational Multi-Task Learning with Gumbel-Softmax Priors [105.22406384964144]
Multi-task learning aims to explore task relatedness to improve individual tasks.
We propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks.
arXiv Detail & Related papers (2021-11-09T18:49:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.