Related papers: Structural analysis of an all-purpose question answering model

Structural analysis of an all-purpose question answering model

URL: http://arxiv.org/abs/2104.06045v1
Date: Tue, 13 Apr 2021 09:20:44 GMT
Title: Structural analysis of an all-purpose question answering model
Authors: Vincent Micheli, Quentin Heinrich, Fran\c{c}ois Fleuret, Wacim Belblidia
Abstract summary: We conduct a structural analysis of a new all-purpose question answering model that we introduce. Surprisingly, this model retains single-task performance even in the absence of a strong transfer effect between tasks. We observe that attention heads specialize in a particular task and that some heads are more conducive to learning than others in both the multi-task and single-task settings.
Score: 0.42056926734482064
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Attention is a key component of the now ubiquitous pre-trained language models. By learning to focus on relevant pieces of information, these Transformer-based architectures have proven capable of tackling several tasks at once and sometimes even surpass their single-task counterparts. To better understand this phenomenon, we conduct a structural analysis of a new all-purpose question answering model that we introduce. Surprisingly, this model retains single-task performance even in the absence of a strong transfer effect between tasks. Through attention head importance scoring, we observe that attention heads specialize in a particular task and that some heads are more conducive to learning than others in both the multi-task and single-task settings.

Related papers

Effects of Prompt Length on Domain-specific Tasks for Large Language Models [2.5340380105092826]
Large Language Models have garnered significant attention for their strong performance in various natural language tasks. This paper aims to bridge the research gap on how prompt design impacts models' ability to perform domain-specific tasks.
arXiv Detail & Related papers (2025-02-20T04:42:06Z)
Object-Centric Multi-Task Learning for Human Instances [8.035105819936808]
We explore a compact multi-task network architecture that maximally shares the parameters of the multiple tasks via object-centric learning. We propose a novel query design to encode the human instance information effectively, called human-centric query (HCQ) Experimental results show that the proposed multi-task network achieves comparable accuracy to state-of-the-art task-specific models.
arXiv Detail & Related papers (2023-03-13T01:10:50Z)
Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge. We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z)
Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding [51.31622274823167]
We propose a hierarchical framework with a coarse-to-fine paradigm, with the bottom level shared to all the tasks, the mid-level divided to different groups, and the top-level assigned to each of the tasks. This allows our model to learn basic language properties from all tasks, boost performance on relevant tasks, and reduce the negative impact from irrelevant tasks.
arXiv Detail & Related papers (2022-08-19T02:46:20Z)
Saliency-Regularized Deep Multi-Task Learning [7.3810864598379755]
Multitask learning enforces multiple learning tasks to share knowledge to improve their generalization abilities. Modern deep multitask learning can jointly learn latent features and task sharing, but they are obscure in task relation. This paper proposes a new multitask learning framework that jointly learns latent features and explicit task relations.
arXiv Detail & Related papers (2022-07-03T20:26:44Z)
Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory. We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z)
Towards More Generalizable One-shot Visual Imitation Learning [81.09074706236858]
A general-purpose robot should be able to master a wide range of tasks and quickly learn a novel one by leveraging past experiences. One-shot imitation learning (OSIL) approaches this goal by training an agent with (pairs of) expert demonstrations. We push for a higher level of generalization ability by investigating a more ambitious multi-task setup.
arXiv Detail & Related papers (2021-10-26T05:49:46Z)
On the relationship between disentanglement and multi-task learning [62.997667081978825]
We take a closer look at the relationship between disentanglement and multi-task learning based on hard parameter sharing. We show that disentanglement appears naturally during the process of multi-task neural network training.
arXiv Detail & Related papers (2021-10-07T14:35:34Z)
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature. First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning) Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.