Related papers: Multitask Learning with Single Gradient Step Update for Task Balancing

Multitask Learning with Single Gradient Step Update for Task Balancing

URL: http://arxiv.org/abs/2005.09910v2
Date: Tue, 2 Jun 2020 12:29:42 GMT
Title: Multitask Learning with Single Gradient Step Update for Task Balancing
Authors: Sungjae Lee, Youngdoo Son
Abstract summary: We propose an algorithm to balance between tasks at the gradient level by applying gradient-based meta-learning to multitask learning. We apply the proposed method to various multitask computer vision problems and achieve state-of-the-art performance.
Score: 4.330814031477772
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multitask learning is a methodology to boost generalization performance and also reduce computational intensity and memory usage. However, learning multiple tasks simultaneously can be more difficult than learning a single task because it can cause imbalance among tasks. To address the imbalance problem, we propose an algorithm to balance between tasks at the gradient level by applying gradient-based meta-learning to multitask learning. The proposed method trains shared layers and task-specific layers separately so that the two layers with different roles in a multitask network can be fitted to their own purposes. In particular, the shared layer that contains informative knowledge shared among tasks is trained by employing single gradient step update and inner/outer loop training to mitigate the imbalance problem at the gradient level. We apply the proposed method to various multitask computer vision problems and achieve state-of-the-art performance.

Related papers

Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning. We devise task-aware gating functions to route examples from different tasks to specialized experts. This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z)
Leveraging convergence behavior to balance conflicting tasks in multi-task learning [3.6212652499950138]
Multi-Task Learning uses correlated tasks to improve performance generalization. Tasks often conflict with each other, which makes it challenging to define how the gradients of multiple tasks should be combined. We propose a method that takes into account temporal behaviour of the gradients to create a dynamic bias that adjust the importance of each task during the backpropagation.
arXiv Detail & Related papers (2022-04-14T01:52:34Z)
Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients. We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function. CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z)
Efficiently Identifying Task Groupings for Multi-Task Learning [55.80489920205404]
Multi-task learning can leverage information learned by one task to benefit the training of other tasks. We suggest an approach to select which tasks should train together in multi-task learning models. Our method determines task groupings in a single training run by co-training all tasks together and quantifying the effect to which one task's gradient would affect another task's loss.
arXiv Detail & Related papers (2021-09-10T02:01:43Z)
Measuring and Harnessing Transference in Multi-Task Learning [58.48659733262734]
Multi-task learning can leverage information learned by one task to benefit the training of other tasks. We analyze the dynamics of information transfer, or transference, across tasks throughout training.
arXiv Detail & Related papers (2020-10-29T08:25:43Z)
HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with Auxiliary Tasks [4.095907708855597]
Multi-task learning (MTL) can improve performance on a task by sharing representations with one or more related auxiliary-tasks. Usually, MTL-networks are trained on a composite loss function formed by a constant weighted combination of the separate task losses. In practice, constant loss weights lead to poor results for two reasons: (i) for mini-batch based optimisation, the optimal task weights vary significantly from one update to the next depending on mini-batch sample composition. We introduce HydaLearn, an intelligent weighting algorithm that connects main-task gain to the individual task gradients, in order to inform
arXiv Detail & Related papers (2020-08-26T16:04:02Z)
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature. First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning) Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)
Knowledge Distillation for Multi-task Learning [38.20005345733544]
Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation. Learning such a model requires to jointly optimize losses of a set of tasks with different difficulty levels, magnitudes, and characteristics. We propose a knowledge distillation based method in this work to address the imbalance problem in multi-task learning.
arXiv Detail & Related papers (2020-07-14T08:02:42Z)
Learned Weight Sharing for Deep Multi-Task Learning by Natural Evolution Strategy and Stochastic Gradient Descent [0.0]
We propose an algorithm to learn the assignment between a shared set of weights and task-specific layers. Learning takes place via a combination of natural evolution strategy and gradient descent. The end result are task-specific networks that share weights but allow independent inference.
arXiv Detail & Related papers (2020-03-23T10:21:44Z)
Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.