One Model, Multiple Tasks: Pathways for Natural Language Understanding
- URL: http://arxiv.org/abs/2203.03312v1
- Date: Mon, 7 Mar 2022 11:48:09 GMT
- Title: One Model, Multiple Tasks: Pathways for Natural Language Understanding
- Authors: Duyu Tang, Fan Zhang, Yong Dai, Cong Zhou, Shuangzhi Wu and Shuming
Shi
- Abstract summary: This paper presents a Pathways approach to handle many tasks at once.
Unlike prevailing single-purpose models that overspecialize at individual tasks and learn from scratch when being extended to new tasks, our approach is general-purpose with the ability of stitching together existing skills to learn new tasks more effectively.
- Score: 34.58880663537492
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a Pathways approach to handle many tasks at once. Our
approach is general-purpose and sparse. Unlike prevailing single-purpose models
that overspecialize at individual tasks and learn from scratch when being
extended to new tasks, our approach is general-purpose with the ability of
stitching together existing skills to learn new tasks more effectively.
Different from traditional dense models that always activate all the model
parameters, our approach is sparsely activated: only relevant parts of the
model (like pathways through the network) are activated.
We take natural language understanding as a case study and define a set of
skills like \textit{the skill of understanding the sentiment of text} and
\textit{the skill of understanding natural language questions}. These skills
can be reused and combined to support many different tasks and situations. We
develop our system using Transformer as the backbone. For each skill, we
implement skill-specific feed-forward networks, which are activated only if the
skill is relevant to the task. An appealing feature of our model is that it not
only supports sparsely activated fine-tuning, but also allows us to pretrain
skills in the same sparse way with masked language modeling and next sentence
prediction. We call this model \textbf{SkillNet}.
We have three major findings. First, with only one model checkpoint, SkillNet
performs better than task-specific fine-tuning and two multi-task learning
baselines (i.e., dense model and Mixture-of-Experts model) on six tasks.
Second, sparsely activated pre-training further improves the overall
performance. Third, SkillNet significantly outperforms baseline systems when
being extended to new tasks.
Related papers
- LIMT: Language-Informed Multi-Task Visual World Models [6.128332310539627]
Multi-task reinforcement learning can be very challenging due to the increased sample complexity and the potentially conflicting task objectives.
We propose a method for learning multi-task visual world models, leveraging pre-trained language models to extract semantically meaningful task representations.
Our results highlight the benefits of using language-driven task representations for world models and a clear advantage of model-based multi-task learning over the more common model-free paradigm.
arXiv Detail & Related papers (2024-07-18T12:40:58Z) - SkillNet-X: A Multilingual Multitask Model with Sparsely Activated
Skills [51.74947795895178]
This paper proposes a general multilingual multitask model, named SkillNet-X.
We define several language-specific skills and task-specific skills, each of which corresponds to a skill module.
We evaluate SkillNet-X on eleven natural language understanding datasets in four languages.
arXiv Detail & Related papers (2023-06-28T12:53:30Z) - Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language
Understanding [51.31622274823167]
We propose a hierarchical framework with a coarse-to-fine paradigm, with the bottom level shared to all the tasks, the mid-level divided to different groups, and the top-level assigned to each of the tasks.
This allows our model to learn basic language properties from all tasks, boost performance on relevant tasks, and reduce the negative impact from irrelevant tasks.
arXiv Detail & Related papers (2022-08-19T02:46:20Z) - One Model, Multiple Modalities: A Sparsely Activated Approach for Text,
Sound, Image, Video and Code [26.40920402395547]
This paper presents an approach that excels at handling multiple modalities of information with a single model.
We develop our model for five modalities including text, image, sound, video and code.
Our model supports self-supervised pretraining with the same sparsely activated way, resulting in better parameters for different modalities.
arXiv Detail & Related papers (2022-05-12T14:39:21Z) - SkillNet-NLG: General-Purpose Natural Language Generation with a
Sparsely Activated Approach [32.79493780508332]
SkillNet-NLG is a sparsely activated approach that handles many natural language generation tasks with one model.
We evaluate on Chinese natural language generation tasks.
arXiv Detail & Related papers (2022-04-26T09:37:01Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - Unified Multimodal Pre-training and Prompt-based Tuning for
Vision-Language Understanding and Generation [86.26522210882699]
We propose Unified multimodal pre-training for both Vision-Language understanding and generation.
The proposed UniVL is capable of handling both understanding tasks and generative tasks.
Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model.
arXiv Detail & Related papers (2021-12-10T14:59:06Z) - Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z) - HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable
Hyper Projections [96.64246471034195]
We propose textscHyperGrid, a new approach for highly effective multi-task learning.
Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
arXiv Detail & Related papers (2020-07-12T02:49:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.