Understanding and Improving Information Transfer in Multi-Task Learning
- URL: http://arxiv.org/abs/2005.00944v1
- Date: Sat, 2 May 2020 23:43:52 GMT
- Title: Understanding and Improving Information Transfer in Multi-Task Learning
- Authors: Sen Wu, Hongyang R. Zhang, Christopher R\'e
- Abstract summary: We study an architecture with a shared module for all tasks and a separate output module for each task.
We show that misalignment between task data can cause negative transfer (or hurt performance) and provide sufficient conditions for positive transfer.
Inspired by the theoretical insights, we show that aligning tasks' embedding layers leads to performance gains for multi-task training and transfer learning.
- Score: 14.43111978531182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate multi-task learning approaches that use a shared feature
representation for all tasks. To better understand the transfer of task
information, we study an architecture with a shared module for all tasks and a
separate output module for each task. We study the theory of this setting on
linear and ReLU-activated models. Our key observation is that whether or not
tasks' data are well-aligned can significantly affect the performance of
multi-task learning. We show that misalignment between task data can cause
negative transfer (or hurt performance) and provide sufficient conditions for
positive transfer. Inspired by the theoretical insights, we show that aligning
tasks' embedding layers leads to performance gains for multi-task training and
transfer learning on the GLUE benchmark and sentiment analysis tasks; for
example, we obtain a 2.35% GLUE score average improvement on 5 GLUE tasks over
BERT-LARGE using our alignment method. We also design an SVD-based task
reweighting scheme and show that it improves the robustness of multi-task
training on a multi-label image dataset.
Related papers
- Less is More: High-value Data Selection for Visual Instruction Tuning [127.38740043393527]
We propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost.
Our approach using only about 15% data can achieve comparable average performance to the full-data fine-tuned model across eight benchmarks.
arXiv Detail & Related papers (2024-03-14T16:47:25Z) - CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction
Tuning [20.58878416527427]
We propose a novel Comprehensive Task Balancing algorithm for multi-task visual instruction tuning of LMMs.
Our CoTBal leads to superior overall performance in multi-task visual instruction tuning.
arXiv Detail & Related papers (2024-03-07T09:11:16Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - TaskMix: Data Augmentation for Meta-Learning of Spoken Intent
Understanding [0.0]
We show that a state-of-the-art data augmentation method worsens this problem of overfitting when the task diversity is low.
We propose a simple method, TaskMix, which synthesizes new tasks by linearly interpolating existing tasks.
We show that TaskMix outperforms baselines, alleviates overfitting when task diversity is low, and does not degrade performance even when it is high.
arXiv Detail & Related papers (2022-09-26T00:37:40Z) - Explaining the Effectiveness of Multi-Task Learning for Efficient
Knowledge Extraction from Spine MRI Reports [2.5953185061765884]
We show that a single multi-tasking model can match the performance of task specific models.
We validate our observations on our internal radiologist-annotated datasets on the cervical and lumbar spine.
arXiv Detail & Related papers (2022-05-06T01:51:19Z) - Variational Multi-Task Learning with Gumbel-Softmax Priors [105.22406384964144]
Multi-task learning aims to explore task relatedness to improve individual tasks.
We propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks.
arXiv Detail & Related papers (2021-11-09T18:49:45Z) - Semi-supervised Multi-task Learning for Semantics and Depth [88.77716991603252]
Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance.
We propose the Semi-supervised Multi-Task Learning (MTL) method to leverage the available supervisory signals from different datasets.
We present a domain-aware discriminator structure with various alignment formulations to mitigate the domain discrepancy issue among datasets.
arXiv Detail & Related papers (2021-10-14T07:43:39Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.