Combining Modular Skills in Multitask Learning
- URL: http://arxiv.org/abs/2202.13914v2
- Date: Tue, 1 Mar 2022 10:50:30 GMT
- Title: Combining Modular Skills in Multitask Learning
- Authors: Edoardo M. Ponti, Alessandro Sordoni, Yoshua Bengio and Siva Reddy
- Abstract summary: A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
- Score: 149.8001096811708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A modular design encourages neural models to disentangle and recombine
different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume that each task is associated with a subset of latent
discrete skills from a (potentially small) inventory. In turn, skills
correspond to parameter-efficient (sparse / low-rank) model parameterisations.
By jointly learning these and a task-skill allocation matrix, the network for
each task is instantiated as the average of the parameters of active skills. To
favour non-trivial soft partitions of skills across tasks, we experiment with a
series of inductive biases, such as an Indian Buffet Process prior and a
two-speed learning rate. We evaluate our latent-skill model on two main
settings: 1) multitask reinforcement learning for grounded instruction
following on 8 levels of the BabyAI platform; and 2) few-shot adaptation of
pre-trained text-to-text generative models on CrossFit, a benchmark comprising
160 NLP tasks. We find that the modular design of a network significantly
increases sample efficiency in reinforcement learning and few-shot
generalisation in supervised learning, compared to baselines with fully shared,
task-specific, or conditionally generated parameters where knowledge is
entangled across tasks. In addition, we show how discrete skills help
interpretability, as they yield an explicit hierarchy of tasks.
Related papers
- Customizable Combination of Parameter-Efficient Modules for Multi-Task
Learning [11.260650180067278]
We introduce a novel approach that combines task-common skills and task-specific skills.
A skill assignment matrix is jointly learned.
Our findings demonstrate that C-Poly outperforms fully-shared, task-specific, and skill-indistinguishable baselines.
arXiv Detail & Related papers (2023-12-06T02:47:56Z) - Saliency-Regularized Deep Multi-Task Learning [7.3810864598379755]
Multitask learning enforces multiple learning tasks to share knowledge to improve their generalization abilities.
Modern deep multitask learning can jointly learn latent features and task sharing, but they are obscure in task relation.
This paper proposes a new multitask learning framework that jointly learns latent features and explicit task relations.
arXiv Detail & Related papers (2022-07-03T20:26:44Z) - An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale
Multitask Learning Systems [4.675744559395732]
Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer.
State of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks.
We propose an evolutionary method that can generate a large scale multitask model and can support the dynamic and continuous addition of new tasks.
arXiv Detail & Related papers (2022-05-25T13:10:47Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - One Model, Multiple Tasks: Pathways for Natural Language Understanding [34.58880663537492]
This paper presents a Pathways approach to handle many tasks at once.
Unlike prevailing single-purpose models that overspecialize at individual tasks and learn from scratch when being extended to new tasks, our approach is general-purpose with the ability of stitching together existing skills to learn new tasks more effectively.
arXiv Detail & Related papers (2022-03-07T11:48:09Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - Reparameterizing Convolutions for Incremental Multi-Task Learning
without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature.
First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning)
Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z) - Adversarial Continual Learning [99.56738010842301]
We propose a hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features.
Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills.
arXiv Detail & Related papers (2020-03-21T02:08:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.