Related papers: Conflict-Averse Gradient Descent for Multi-task Learning

Conflict-Averse Gradient Descent for Multi-task Learning

URL: http://arxiv.org/abs/2110.14048v2
Date: Wed, 21 Feb 2024 04:18:38 GMT
Title: Conflict-Averse Gradient Descent for Multi-task Learning
Authors: Bo Liu and Xingchao Liu and Xiaojie Jin and Peter Stone and Qiang Liu
Abstract summary: A major challenge in optimizing a multi-task model is the conflicting gradients. We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function. CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
Score: 56.379937772617
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The goal of multi-task learning is to enable more efficient learning than single task learning by sharing model structures for a diverse set of tasks. A standard multi-task learning objective is to minimize the average loss across all tasks. While straightforward, using this objective often results in much worse final performance for each task than learning them independently. A major challenge in optimizing a multi-task model is the conflicting gradients, where gradients of different task objectives are not well aligned so that following the average gradient direction can be detrimental to specific tasks' performance. Previous work has proposed several heuristics to manipulate the task gradients for mitigating this problem. But most of them lack convergence guarantee and/or could converge to any Pareto-stationary point. In this paper, we introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function, while leveraging the worst local improvement of individual tasks to regularize the algorithm trajectory. CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss. It includes the regular gradient descent (GD) and the multiple gradient descent algorithm (MGDA) in the multi-objective optimization (MOO) literature as special cases. On a series of challenging multi-task supervised learning and reinforcement learning tasks, CAGrad achieves improved performance over prior state-of-the-art multi-objective gradient manipulation methods.

Related papers

Gradient Similarity Surgery in Multi-Task Deep Learning [1.2299544525529198]
This work introduces a novel gradient surgery method based on a gradient magnitude similarity measure to guide the optimisation process.<n>The Similarity-Aware Momentum Gradient Surgery (SAM-GS) adopts gradient equalisation and modulation of the first-order momentum.
arXiv Detail & Related papers (2025-06-06T14:40:50Z)
Task Weighting through Gradient Projection for Multitask Learning [5.5967570276373655]
In multitask learning, conflicts between task gradients are a frequent issue degrading a model's training performance. In this work, we present a method to adapt the Gradient Projection algorithm PCGrad to simultaneously perform task prioritization. Our approach differs from traditional task weighting performed by scaling task losses in that our weighting scheme applies only in cases where tasks are in conflict, but lets the training proceed unhindered otherwise.
arXiv Detail & Related papers (2024-09-03T11:17:44Z)
FAMO: Fast Adaptive Multitask Optimization [48.59232177073481]
We introduce Fast Adaptive Multitask Optimization FAMO, a dynamic weighting method that decreases task losses in a balanced way. Our results indicate that FAMO achieves comparable or superior performance to state-of-the-art gradient manipulation techniques.
arXiv Detail & Related papers (2023-06-06T15:39:54Z)
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks. Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer. ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z)
Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
On Steering Multi-Annotations per Sample for Multi-Task Learning [79.98259057711044]
The study of multi-task learning has drawn great attention from the community. Despite the remarkable progress, the challenge of optimally learning different tasks simultaneously remains to be explored. Previous works attempt to modify the gradients from different tasks. Yet these methods give a subjective assumption of the relationship between tasks, and the modified gradient may be less accurate. In this paper, we introduce Task Allocation(STA), a mechanism that addresses this issue by a task allocation approach, in which each sample is randomly allocated a subset of tasks. For further progress, we propose Interleaved Task Allocation(ISTA) to iteratively allocate all
arXiv Detail & Related papers (2022-03-06T11:57:18Z)
SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task Learning [0.0]
Multi-task learning (MTL) is a subfield of machine learning with important applications. The best MTL optimization methods require individually computing the gradient of each task's loss function. We propose Scaled Loss Approximate Weighting (SLAW), a method for multi-task optimization that matches the performance of the best existing methods while being much more efficient.
arXiv Detail & Related papers (2021-09-16T20:58:40Z)
Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.