Related papers: Gradient Similarity Surgery in Multi-Task Deep Learning

Gradient Similarity Surgery in Multi-Task Deep Learning

URL: http://arxiv.org/abs/2506.06130v1
Date: Fri, 06 Jun 2025 14:40:50 GMT
Title: Gradient Similarity Surgery in Multi-Task Deep Learning
Authors: Thomas Borsani, Andrea Rosani, Giuseppe Nicosia, Giuseppe Di Fatta,
Abstract summary: This work introduces a novel gradient surgery method based on a gradient magnitude similarity measure to guide the optimisation process.<n>The Similarity-Aware Momentum Gradient Surgery (SAM-GS) adopts gradient equalisation and modulation of the first-order momentum.
Score: 1.2299544525529198
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The multi-task learning ($MTL$) paradigm aims to simultaneously learn multiple tasks within a single model capturing higher-level, more general hidden patterns that are shared by the tasks. In deep learning, a significant challenge in the backpropagation training process is the design of advanced optimisers to improve the convergence speed and stability of the gradient descent learning rule. In particular, in multi-task deep learning ($MTDL$) the multitude of tasks may generate potentially conflicting gradients that would hinder the concurrent convergence of the diverse loss functions. This challenge arises when the gradients of the task objectives have either different magnitudes or opposite directions, causing one or a few to dominate or to interfere with each other, thus degrading the training process. Gradient surgery methods address the problem explicitly dealing with conflicting gradients by adjusting the overall gradient trajectory. This work introduces a novel gradient surgery method, the Similarity-Aware Momentum Gradient Surgery (SAM-GS), which provides an effective and scalable approach based on a gradient magnitude similarity measure to guide the optimisation process. The SAM-GS surgery adopts gradient equalisation and modulation of the first-order momentum. A series of experimental tests have shown the effectiveness of SAM-GS on synthetic problems and $MTL$ benchmarks. Gradient magnitude similarity plays a crucial role in regularising gradient aggregation in $MTDL$ for the optimisation of the learning process.

Related papers

Fantastic Multi-Task Gradient Updates and How to Find Them In a Cone [14.702480423653984]
We propose ConicGrad, a principled, scalable, and robust MTL approach formulated as a constrained optimization problem.<n>Our method introduces an angular constraint to dynamically regulate gradient update directions, confining them within a cone centered on the reference gradient of the overall objective.<n>We conduct extensive experiments on standard supervised learning and reinforcement learning MTL benchmarks, and demonstrate that ConicGrad achieves state-of-the-art performance across diverse tasks.
arXiv Detail & Related papers (2025-01-31T23:11:12Z)
Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective [33.477681689943516]
A common issue in multi-task learning is the occurrence of gradient conflict.<n>We propose a strategy to reduce such conflicts through sparse training (ST)<n>Our experiments demonstrate that ST effectively mitigates conflicting gradients and leads to superior performance.
arXiv Detail & Related papers (2024-11-27T18:58:22Z)
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning [50.7008456698935]
Gradient-Guided Modulation (CGGM) is a novel method to balance multimodal learning with gradients. We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS. CGGM outperforms all the baselines and other state-of-the-art methods consistently.
arXiv Detail & Related papers (2024-11-03T02:38:43Z)
Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement [69.51496713076253]
In this paper, we focus on the aforementioned efficiency aspects of existing MTL methods. We first carry out large-scale experiments of the methods with smaller backbones and on a the MetaGraspNet dataset as a new test ground. We also propose Feature Disentanglement measure as a novel and efficient identifier of the challenges in MTL.
arXiv Detail & Related papers (2024-02-05T22:15:55Z)
Dual-Balancing for Multi-Task Learning [42.613360970194734]
We propose a Dual-Balancing Multi-Task Learning (DB-MTL) method to alleviate the task balancing problem from both loss and gradient perspectives. DB-MTL ensures loss-scale balancing by performing a logarithm transformation on each task loss, and guarantees gradient-magnitude balancing via normalizing all task gradients to the same magnitude as the maximum gradient norm.
arXiv Detail & Related papers (2023-08-23T09:41:28Z)
Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach [38.76462300149459]
We develop a Multi-objective Correction (MoCo) method for multi-objective gradient optimization. The unique feature of our method is that it can guarantee convergence without increasing the non fairness gradient.
arXiv Detail & Related papers (2022-10-23T05:54:26Z)
One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning [61.662504399411695]
We introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal. When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving similar or higher performance.
arXiv Detail & Related papers (2021-10-30T08:36:52Z)
Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients. We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function. CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z)
SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task Learning [0.0]
Multi-task learning (MTL) is a subfield of machine learning with important applications. The best MTL optimization methods require individually computing the gradient of each task's loss function. We propose Scaled Loss Approximate Weighting (SLAW), a method for multi-task optimization that matches the performance of the best existing methods while being much more efficient.
arXiv Detail & Related papers (2021-09-16T20:58:40Z)
Layerwise Optimization by Gradient Decomposition for Continual Learning [78.58714373218118]
Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains. When learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic forgetting"
arXiv Detail & Related papers (2021-05-17T01:15:57Z)
Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.