Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
- URL: http://arxiv.org/abs/2411.18615v1
- Date: Wed, 27 Nov 2024 18:58:22 GMT
- Title: Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
- Authors: Zhi Zhang, Jiayi Shen, Congfeng Cao, Gaole Dai, Shiji Zhou, Qizhe Zhang, Shanghang Zhang, Ekaterina Shutova,
- Abstract summary: A common issue in multi-task learning is the occurrence of gradient conflict.
We propose a strategy to reduce such conflicts through sparse training (ST)
Our experiments demonstrate that ST effectively mitigates conflicting gradients and leads to superior performance.
- Score: 33.477681689943516
- License:
- Abstract: Advancing towards generalist agents necessitates the concurrent processing of multiple tasks using a unified model, thereby underscoring the growing significance of simultaneous model training on multiple downstream tasks. A common issue in multi-task learning is the occurrence of gradient conflict, which leads to potential competition among different tasks during joint training. This competition often results in improvements in one task at the expense of deterioration in another. Although several optimization methods have been developed to address this issue by manipulating task gradients for better task balancing, they cannot decrease the incidence of gradient conflict. In this paper, we systematically investigate the occurrence of gradient conflict across different methods and propose a strategy to reduce such conflicts through sparse training (ST), wherein only a portion of the model's parameters are updated during training while keeping the rest unchanged. Our extensive experiments demonstrate that ST effectively mitigates conflicting gradients and leads to superior performance. Furthermore, ST can be easily integrated with gradient manipulation techniques, thus enhancing their effectiveness.
Related papers
- Preventing Conflicting Gradients in Neural Marked Temporal Point Processes [2.3020018305241337]
Neural Marked Temporal Point Processes (MTPP) are flexible models to capture complex temporal inter-dependencies between labeled events.
We show that learning a MTPP model can be framed as a two-task learning problem, where both tasks share a common set of trainable parameters that are jointly optimized.
We introduce novel parametrizations for neural MTPP models that allow for separate modeling and training of each task, effectively avoiding the problem of conflicting gradients.
arXiv Detail & Related papers (2024-12-11T18:10:04Z) - Task Weighting through Gradient Projection for Multitask Learning [5.5967570276373655]
In multitask learning, conflicts between task gradients are a frequent issue degrading a model's training performance.
In this work, we present a method to adapt the Gradient Projection algorithm PCGrad to simultaneously perform task prioritization.
Our approach differs from traditional task weighting performed by scaling task losses in that our weighting scheme applies only in cases where tasks are in conflict, but lets the training proceed unhindered otherwise.
arXiv Detail & Related papers (2024-09-03T11:17:44Z) - Alternate Training of Shared and Task-Specific Parameters for Multi-Task
Neural Networks [49.1574468325115]
This paper introduces novel alternate training procedures for hard- parameter sharing Multi-Task Neural Networks (MTNNs)
The proposed alternate training method updates shared and task-specific weights alternately, exploiting the multi-head architecture of the model.
Empirical experiments demonstrate delayed overfitting, improved prediction, and reduced computational demands.
arXiv Detail & Related papers (2023-12-26T21:33:03Z) - On Task Performance and Model Calibration with Supervised and
Self-Ensembled In-Context Learning [71.44986275228747]
In-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs)
However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration)
arXiv Detail & Related papers (2023-12-21T11:55:10Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Multi-Task Learning as a Bargaining Game [63.49888996291245]
In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks.
Since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts.
We propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update.
arXiv Detail & Related papers (2022-02-02T13:21:53Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Multitask Learning with Single Gradient Step Update for Task Balancing [4.330814031477772]
We propose an algorithm to balance between tasks at the gradient level by applying gradient-based meta-learning to multitask learning.
We apply the proposed method to various multitask computer vision problems and achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-05-20T08:34:20Z) - Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks.
The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood.
We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.