Related papers: One Model, Multiple Tasks: Pathways for Natural Language Understanding

One Model, Multiple Tasks: Pathways for Natural Language Understanding

URL: http://arxiv.org/abs/2203.03312v1
Date: Mon, 7 Mar 2022 11:48:09 GMT
Title: One Model, Multiple Tasks: Pathways for Natural Language Understanding
Authors: Duyu Tang, Fan Zhang, Yong Dai, Cong Zhou, Shuangzhi Wu and Shuming Shi
Abstract summary: This paper presents a Pathways approach to handle many tasks at once. Unlike prevailing single-purpose models that overspecialize at individual tasks and learn from scratch when being extended to new tasks, our approach is general-purpose with the ability of stitching together existing skills to learn new tasks more effectively.
Score: 34.58880663537492
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a Pathways approach to handle many tasks at once. Our approach is general-purpose and sparse. Unlike prevailing single-purpose models that overspecialize at individual tasks and learn from scratch when being extended to new tasks, our approach is general-purpose with the ability of stitching together existing skills to learn new tasks more effectively. Different from traditional dense models that always activate all the model parameters, our approach is sparsely activated: only relevant parts of the model (like pathways through the network) are activated. We take natural language understanding as a case study and define a set of skills like \textit{the skill of understanding the sentiment of text} and \textit{the skill of understanding natural language questions}. These skills can be reused and combined to support many different tasks and situations. We develop our system using Transformer as the backbone. For each skill, we implement skill-specific feed-forward networks, which are activated only if the skill is relevant to the task. An appealing feature of our model is that it not only supports sparsely activated fine-tuning, but also allows us to pretrain skills in the same sparse way with masked language modeling and next sentence prediction. We call this model \textbf{SkillNet}. We have three major findings. First, with only one model checkpoint, SkillNet performs better than task-specific fine-tuning and two multi-task learning baselines (i.e., dense model and Mixture-of-Experts model) on six tasks. Second, sparsely activated pre-training further improves the overall performance. Third, SkillNet significantly outperforms baseline systems when being extended to new tasks.

Related papers

LIMT: Language-Informed Multi-Task Visual World Models [6.128332310539627]
Multi-task reinforcement learning can be very challenging due to the increased sample complexity and the potentially conflicting task objectives. We propose a method for learning multi-task visual world models, leveraging pre-trained language models to extract semantically meaningful task representations. Our results highlight the benefits of using language-driven task representations for world models and a clear advantage of model-based multi-task learning over the more common model-free paradigm.
arXiv Detail & Related papers (2024-07-18T12:40:58Z)
SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills [51.74947795895178]
This paper proposes a general multilingual multitask model, named SkillNet-X. We define several language-specific skills and task-specific skills, each of which corresponds to a skill module. We evaluate SkillNet-X on eleven natural language understanding datasets in four languages.
arXiv Detail & Related papers (2023-06-28T12:53:30Z)
Skill-Based Few-Shot Selection for In-Context Learning [123.26522773708683]
Skill-KNN is a skill-based few-shot selection method for in-context learning. It does not require training or fine-tuning of any models, making it suitable for frequently expanding or changing example banks. Experimental results across five cross-domain semantic parsing datasets and six backbone models show that Skill-KNN significantly outperforms existing methods.
arXiv Detail & Related papers (2023-05-23T16:28:29Z)
Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding [51.31622274823167]
We propose a hierarchical framework with a coarse-to-fine paradigm, with the bottom level shared to all the tasks, the mid-level divided to different groups, and the top-level assigned to each of the tasks. This allows our model to learn basic language properties from all tasks, boost performance on relevant tasks, and reduce the negative impact from irrelevant tasks.
arXiv Detail & Related papers (2022-08-19T02:46:20Z)
One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code [26.40920402395547]
This paper presents an approach that excels at handling multiple modalities of information with a single model. We develop our model for five modalities including text, image, sound, video and code. Our model supports self-supervised pretraining with the same sparsely activated way, resulting in better parameters for different modalities.
arXiv Detail & Related papers (2022-05-12T14:39:21Z)
SkillNet-NLG: General-Purpose Natural Language Generation with a Sparsely Activated Approach [32.79493780508332]
SkillNet-NLG is a sparsely activated approach that handles many natural language generation tasks with one model. We evaluate on Chinese natural language generation tasks.
arXiv Detail & Related papers (2022-04-26T09:37:01Z)
Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory. We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z)
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation [86.26522210882699]
We propose Unified multimodal pre-training for both Vision-Language understanding and generation. The proposed UniVL is capable of handling both understanding tasks and generative tasks. Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model.
arXiv Detail & Related papers (2021-12-10T14:59:06Z)
Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form. We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z)
HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections [96.64246471034195]
We propose textscHyperGrid, a new approach for highly effective multi-task learning. Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
arXiv Detail & Related papers (2020-07-12T02:49:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.