Related papers: Sample-Level Weighting for Multi-Task Learning with Auxiliary Tasks

Sample-Level Weighting for Multi-Task Learning with Auxiliary Tasks

URL: http://arxiv.org/abs/2306.04519v1
Date: Wed, 7 Jun 2023 15:29:46 GMT
Title: Sample-Level Weighting for Multi-Task Learning with Auxiliary Tasks
Authors: Emilie Gr\'egoire, Hafeez Chaudhary and Sam Verboven
Abstract summary: Multi-task learning (MTL) can improve the generalization performance of neural networks by sharing representations with related tasks. MTL can also degrade performance through harmful interference between tasks. We propose SLGrad, a sample-level weighting algorithm for multi-task learning with auxiliary tasks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-task learning (MTL) can improve the generalization performance of neural networks by sharing representations with related tasks. Nonetheless, MTL can also degrade performance through harmful interference between tasks. Recent work has pursued task-specific loss weighting as a solution for this interference. However, existing algorithms treat tasks as atomic, lacking the ability to explicitly separate harmful and helpful signals beyond the task level. To this end, we propose SLGrad, a sample-level weighting algorithm for multi-task learning with auxiliary tasks. Through sample-specific task weights, SLGrad reshapes the task distributions during training to eliminate harmful auxiliary signals and augment useful task signals. Substantial generalization performance gains are observed on (semi-) synthetic datasets and common supervised multi-task problems.

Related papers

StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets [14.867396697566257]
We extend the partial learning setup to a zero-shot setting, training a multi-task model on multiple datasets, each labeled for only a subset of tasks.<n>Our method, StableMTL, repurposes image generators for latent regression.<n>Instead of per-task losses requiring careful balancing, a unified latent loss is adopted, enabling seamless scaling to more tasks.
arXiv Detail & Related papers (2025-06-09T17:59:59Z)
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge [12.367471198090655]
Task Arithmetic (TA), which combines task vectors derived from fine-tuning, enables multi-task learning and task forgetting but struggles to isolate task-specific knowledge from general instruction-following behavior. We propose Layer-Aware Task Arithmetic (LATA), a novel approach that assigns layer-specific weights to task vectors based on their alignment with instruction-following or task-specific components.
arXiv Detail & Related papers (2025-02-27T15:22:14Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
Deep Task-specific Bottom Representation Network for Multi-Task Recommendation [36.128708266100645]
We propose the Deep Task-specific Bottom Representation Network (DTRN) to alleviate the negative transfer problem. The two proposed modules can achieve the purpose of getting task-specific bottom representation to relieve tasks' mutual interference.
arXiv Detail & Related papers (2023-08-11T08:04:43Z)
Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing with Non-Learnable Primitives [19.90788777476128]
Multi-task learning (MTL) seeks to learn a single model to accomplish multiple tasks by leveraging shared information among the tasks. Existing MTL models have been known to suffer from negative interference among tasks. We propose ETR-NLP to mitigate task interference through a synergistic combination of non-learnable primitives and explicit task routing.
arXiv Detail & Related papers (2023-08-03T22:34:16Z)
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks. Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer. ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z)
Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning. We devise task-aware gating functions to route examples from different tasks to specialized experts. This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z)
Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients. We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function. CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z)
Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z)
Task Uncertainty Loss Reduce Negative Transfer in Asymmetric Multi-task Feature Learning [0.0]
Multi-task learning (MTL) can improve task performance overall relative to single-task learning (STL), but can hide negative transfer (NT) Asymmetric multitask feature learning (AMTFL) is an approach that tries to address this by allowing tasks with higher loss values to have smaller influence on feature representations for learning other tasks. We present examples of NT in two datasets (image recognition and pharmacogenomics) and tackle this challenge by using aleatoric homoscedastic uncertainty to capture the relative confidence between tasks, and set weights for task loss.
arXiv Detail & Related papers (2020-12-17T13:30:45Z)
HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with Auxiliary Tasks [4.095907708855597]
Multi-task learning (MTL) can improve performance on a task by sharing representations with one or more related auxiliary-tasks. Usually, MTL-networks are trained on a composite loss function formed by a constant weighted combination of the separate task losses. In practice, constant loss weights lead to poor results for two reasons: (i) for mini-batch based optimisation, the optimal task weights vary significantly from one update to the next depending on mini-batch sample composition. We introduce HydaLearn, an intelligent weighting algorithm that connects main-task gain to the individual task gradients, in order to inform
arXiv Detail & Related papers (2020-08-26T16:04:02Z)
Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.