Sample-Level Weighting for Multi-Task Learning with Auxiliary Tasks
- URL: http://arxiv.org/abs/2306.04519v1
- Date: Wed, 7 Jun 2023 15:29:46 GMT
- Title: Sample-Level Weighting for Multi-Task Learning with Auxiliary Tasks
- Authors: Emilie Gr\'egoire, Hafeez Chaudhary and Sam Verboven
- Abstract summary: Multi-task learning (MTL) can improve the generalization performance of neural networks by sharing representations with related tasks.
MTL can also degrade performance through harmful interference between tasks.
We propose SLGrad, a sample-level weighting algorithm for multi-task learning with auxiliary tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task learning (MTL) can improve the generalization performance of
neural networks by sharing representations with related tasks. Nonetheless, MTL
can also degrade performance through harmful interference between tasks. Recent
work has pursued task-specific loss weighting as a solution for this
interference. However, existing algorithms treat tasks as atomic, lacking the
ability to explicitly separate harmful and helpful signals beyond the task
level. To this end, we propose SLGrad, a sample-level weighting algorithm for
multi-task learning with auxiliary tasks. Through sample-specific task weights,
SLGrad reshapes the task distributions during training to eliminate harmful
auxiliary signals and augment useful task signals. Substantial generalization
performance gains are observed on (semi-) synthetic datasets and common
supervised multi-task problems.
Related papers
- Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Deep Task-specific Bottom Representation Network for Multi-Task
Recommendation [36.128708266100645]
We propose the Deep Task-specific Bottom Representation Network (DTRN) to alleviate the negative transfer problem.
The two proposed modules can achieve the purpose of getting task-specific bottom representation to relieve tasks' mutual interference.
arXiv Detail & Related papers (2023-08-11T08:04:43Z) - Mitigating Task Interference in Multi-Task Learning via Explicit Task
Routing with Non-Learnable Primitives [19.90788777476128]
Multi-task learning (MTL) seeks to learn a single model to accomplish multiple tasks by leveraging shared information among the tasks.
Existing MTL models have been known to suffer from negative interference among tasks.
We propose ETR-NLP to mitigate task interference through a synergistic combination of non-learnable primitives and explicit task routing.
arXiv Detail & Related papers (2023-08-03T22:34:16Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - Task Uncertainty Loss Reduce Negative Transfer in Asymmetric Multi-task
Feature Learning [0.0]
Multi-task learning (MTL) can improve task performance overall relative to single-task learning (STL), but can hide negative transfer (NT)
Asymmetric multitask feature learning (AMTFL) is an approach that tries to address this by allowing tasks with higher loss values to have smaller influence on feature representations for learning other tasks.
We present examples of NT in two datasets (image recognition and pharmacogenomics) and tackle this challenge by using aleatoric homoscedastic uncertainty to capture the relative confidence between tasks, and set weights for task loss.
arXiv Detail & Related papers (2020-12-17T13:30:45Z) - HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with
Auxiliary Tasks [4.095907708855597]
Multi-task learning (MTL) can improve performance on a task by sharing representations with one or more related auxiliary-tasks.
Usually, MTL-networks are trained on a composite loss function formed by a constant weighted combination of the separate task losses.
In practice, constant loss weights lead to poor results for two reasons: (i) for mini-batch based optimisation, the optimal task weights vary significantly from one update to the next depending on mini-batch sample composition.
We introduce HydaLearn, an intelligent weighting algorithm that connects main-task gain to the individual task gradients, in order to inform
arXiv Detail & Related papers (2020-08-26T16:04:02Z) - Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks.
The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood.
We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.