Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language
Understanding
- URL: http://arxiv.org/abs/2208.09129v1
- Date: Fri, 19 Aug 2022 02:46:20 GMT
- Title: Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language
Understanding
- Authors: Zhaoye Fei, Yu Tian, Yongkang Wu, Xinyu Zhang, Yutao Zhu, Zheng Liu,
Jiawen Wu, Dejiang Kong, Ruofei Lai, Zhao Cao, Zhicheng Dou and Xipeng Qiu
- Abstract summary: We propose a hierarchical framework with a coarse-to-fine paradigm, with the bottom level shared to all the tasks, the mid-level divided to different groups, and the top-level assigned to each of the tasks.
This allows our model to learn basic language properties from all tasks, boost performance on relevant tasks, and reduce the negative impact from irrelevant tasks.
- Score: 51.31622274823167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalized text representations are the foundation of many natural language
understanding tasks. To fully utilize the different corpus, it is inevitable
that models need to understand the relevance among them. However, many methods
ignore the relevance and adopt a single-channel model (a coarse paradigm)
directly for all tasks, which lacks enough rationality and interpretation. In
addition, some existing works learn downstream tasks by stitches skill block(a
fine paradigm), which might cause irrationalresults due to its redundancy and
noise. Inthis work, we first analyze the task correlation through three
different perspectives, i.e., data property, manual design, and model-based
relevance, based on which the similar tasks are grouped together. Then, we
propose a hierarchical framework with a coarse-to-fine paradigm, with the
bottom level shared to all the tasks, the mid-level divided to different
groups, and the top-level assigned to each of the tasks. This allows our model
to learn basic language properties from all tasks, boost performance on
relevant tasks, and reduce the negative impact from irrelevant tasks. Our
experiments on 13 benchmark datasets across five natural language understanding
tasks demonstrate the superiority of our method.
Related papers
- Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts [75.75548749888029]
We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks.
With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.
arXiv Detail & Related papers (2023-05-11T17:57:49Z) - Saliency-Regularized Deep Multi-Task Learning [7.3810864598379755]
Multitask learning enforces multiple learning tasks to share knowledge to improve their generalization abilities.
Modern deep multitask learning can jointly learn latent features and task sharing, but they are obscure in task relation.
This paper proposes a new multitask learning framework that jointly learns latent features and explicit task relations.
arXiv Detail & Related papers (2022-07-03T20:26:44Z) - InstructionNER: A Multi-Task Instruction-Based Generative Framework for
Few-shot NER [31.32381919473188]
We propose a multi-task instruction-based generative framework, named InstructionNER, for low-resource named entity recognition.
Specifically, we reformulate the NER task as a generation problem, which enriches source sentences with task-specific instructions and answer options, then inferences the entities and types in natural language.
Experimental results show that our method consistently outperforms other baselines on five datasets in few-shot settings.
arXiv Detail & Related papers (2022-03-08T07:56:36Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Structured Prediction as Translation between Augmented Natural Languages [109.50236248762877]
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks.
Instead of tackling the problem by training task-specific discriminatives, we frame it as a translation task between augmented natural languages.
Our approach can match or outperform task-specific models on all tasks, and in particular, achieves new state-of-the-art results on joint entity and relation extraction.
arXiv Detail & Related papers (2021-01-14T18:32:21Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z) - Exploring Neural Entity Representations for Semantic Information [4.925619556605419]
We evaluate eight neural entity embedding methods on a set of simple probing tasks.
We show which methods are able to remember words used to describe entities, learn type, relationship and factual information, and identify how frequently an entity is mentioned.
We also compare these methods in a unified framework on two entity linking tasks and discuss how they generalize to different model architectures and datasets.
arXiv Detail & Related papers (2020-11-17T21:21:37Z) - Modelling Latent Skills for Multitask Language Generation [15.126163032403811]
We present a generative model for multitask conditional language generation.
Our guiding hypothesis is that a shared set of latent skills underlies many disparate language generation tasks.
We instantiate this task embedding space as a latent variable in a latent variable sequence-to-sequence model.
arXiv Detail & Related papers (2020-02-21T20:39:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.