Related papers: Mitigating Negative Transfer in Multi-Task Learning with Exponential Moving Average Loss Weighting Strategies

Mitigating Negative Transfer in Multi-Task Learning with Exponential Moving Average Loss Weighting Strategies

URL: http://arxiv.org/abs/2211.12999v1
Date: Tue, 22 Nov 2022 09:22:48 GMT
Title: Mitigating Negative Transfer in Multi-Task Learning with Exponential Moving Average Loss Weighting Strategies
Authors: Anish Lakkapragada, Essam Sleiman, Saimourya Surabhi, Dennis P. Wall
Abstract summary: Multi-Task Learning (MTL) is a growing subject of interest in deep learning. MTL can be impractical as certain tasks can dominate training and hurt performance in others. We propose techniques for loss balancing based on scaling by the exponential moving average.
Score: 0.981328290471248
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-Task Learning (MTL) is a growing subject of interest in deep learning, due to its ability to train models more efficiently on multiple tasks compared to using a group of conventional single-task models. However, MTL can be impractical as certain tasks can dominate training and hurt performance in others, thus making some tasks perform better in a single-task model compared to a multi-task one. Such problems are broadly classified as negative transfer, and many prior approaches in the literature have been made to mitigate these issues. One such current approach to alleviate negative transfer is to weight each of the losses so that they are on the same scale. Whereas current loss balancing approaches rely on either optimization or complex numerical analysis, none directly scale the losses based on their observed magnitudes. We propose multiple techniques for loss balancing based on scaling by the exponential moving average and benchmark them against current best-performing methods on three established datasets. On these datasets, they achieve comparable, if not higher, performance compared to current best-performing methods.

Related papers

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners [60.75160178669076]
We show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online reinforcement learning.<n>We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL.
arXiv Detail & Related papers (2025-05-29T06:41:45Z)
MTL-UE: Learning to Learn Nothing for Multi-Task Learning [98.42358524454731]
This paper presents MTL-UE, the first unified framework for generating unlearnable examples for multi-task data and MTL models.<n>Instead of optimizing robustness for each sample, we design a generator-based structure that introduces label priors and class-wise feature embeddings.<n>In addition, MTL-UE incorporates intra-task and inter-task embedding regularization to increase inter-class separation and suppress intra-class variance.
arXiv Detail & Related papers (2025-05-08T14:26:00Z)
Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment [0.0]
"Harmonized Transfer Learning and Modality alignment (HarMA)" is a method that simultaneously satisfies task constraints, modality alignment, and single-modality uniform alignment. HarMA achieves state-of-the-art performance in two popular multimodal retrieval tasks in the field of remote sensing.
arXiv Detail & Related papers (2024-04-28T17:20:08Z)
Multitask Learning Can Improve Worst-Group Outcomes [76.92646345152788]
Multitask learning (MTL) is one such widely used technique. We propose to modify standard MTL by regularizing the joint multitask representation space. We find that our regularized MTL approach emphconsistently outperforms JTT on both average and worst-group outcomes.
arXiv Detail & Related papers (2023-12-05T21:38:24Z)
Scalarization for Multi-Task and Multi-Domain Learning at Scale [15.545810422759295]
Training a single model on multiple input domains and/or output tasks allows for compressing information from multiple sources into a unified backbone. However, optimizing such networks is a challenge due to discrepancies between the different tasks or domains.
arXiv Detail & Related papers (2023-10-13T07:31:04Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Equitable Multi-task Learning [18.65048321820911]
Multi-task learning (MTL) has achieved great success in various research domains, such as CV, NLP and IR. We propose a novel multi-task optimization method, named EMTL, to achieve equitable MTL. Our method stably outperforms state-of-the-art methods on the public benchmark datasets of two different research domains.
arXiv Detail & Related papers (2023-06-15T03:37:23Z)
Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks. Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients. We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task Learning [0.0]
Multi-task learning (MTL) is a subfield of machine learning with important applications. The best MTL optimization methods require individually computing the gradient of each task's loss function. We propose Scaled Loss Approximate Weighting (SLAW), a method for multi-task optimization that matches the performance of the best existing methods while being much more efficient.
arXiv Detail & Related papers (2021-09-16T20:58:40Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data [5.689320790746046]
Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer. We propose a novel Transformer architecture consisting of a new conditional attention mechanism and a set of task-conditioned modules.
arXiv Detail & Related papers (2020-09-19T02:04:34Z)
Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL) Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks. As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.