Dual-Balancing for Multi-Task Learning
- URL: http://arxiv.org/abs/2308.12029v2
- Date: Fri, 29 Sep 2023 12:39:15 GMT
- Title: Dual-Balancing for Multi-Task Learning
- Authors: Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen,
Ying-Cong Chen, Shu Liu, James T. Kwok
- Abstract summary: We propose a Dual-Balancing Multi-Task Learning (DB-MTL) method to alleviate the task balancing problem from both loss and gradient perspectives.
DB-MTL ensures loss-scale balancing by performing a logarithm transformation on each task loss, and guarantees gradient-magnitude balancing via normalizing all task gradients to the same magnitude as the maximum gradient norm.
- Score: 42.613360970194734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task learning (MTL), a learning paradigm to learn multiple related
tasks simultaneously, has achieved great success in various fields. However,
task balancing problem remains a significant challenge in MTL, with the
disparity in loss/gradient scales often leading to performance compromises. In
this paper, we propose a Dual-Balancing Multi-Task Learning (DB-MTL) method to
alleviate the task balancing problem from both loss and gradient perspectives.
Specifically, DB-MTL ensures loss-scale balancing by performing a logarithm
transformation on each task loss, and guarantees gradient-magnitude balancing
via normalizing all task gradients to the same magnitude as the maximum
gradient norm. Extensive experiments conducted on several benchmark datasets
consistently demonstrate the state-of-the-art performance of DB-MTL.
Related papers
- Robust-Multi-Task Gradient Boosting [6.718184400443239]
Multi-task learning (MTL) has shown effectiveness in exploiting shared information across tasks to improve generalization.<n>We propose Robust-Multi-Task Gradient Boosting (R-MTGB), a novel boosting framework that explicitly models and adapts to task heterogeneity during training.<n>R-MTGB structures the learning process into three blocks: (1) learning shared patterns, (2) partitioning sequential tasks into outliers and non-outliers with regularized parameters, and (3) fine-tuning task-specific predictors.
arXiv Detail & Related papers (2025-07-15T15:31:12Z) - Gradient Similarity Surgery in Multi-Task Deep Learning [1.2299544525529198]
This work introduces a novel gradient surgery method based on a gradient magnitude similarity measure to guide the optimisation process.<n>The Similarity-Aware Momentum Gradient Surgery (SAM-GS) adopts gradient equalisation and modulation of the first-order momentum.
arXiv Detail & Related papers (2025-06-06T14:40:50Z) - Controlled Data Rebalancing in Multi-Task Learning for Real-World Image Super-Resolution [51.79973519845773]
Real-world image super-resolution (Real-SR) is a challenging problem due to the complex degradation patterns in low-resolution images.<n>We propose an improved paradigm that frames Real-SR as a data-heterogeneous multi-task learning problem.
arXiv Detail & Related papers (2025-06-05T21:40:21Z) - Improvable Gap Balancing for Multi-Task Learning [15.582333026781004]
We propose two novel improvable gap balancing (IGB) algorithms for multi-task learning (MTL)
One takes a simple, and the other (for the first time) deploys deep reinforcement learning for MTL.
Our IGB algorithms lead to the best results in MTL via loss balancing and achieve further improvements when combined with gradient balancing.
arXiv Detail & Related papers (2023-07-28T09:26:03Z) - Equitable Multi-task Learning [18.65048321820911]
Multi-task learning (MTL) has achieved great success in various research domains, such as CV, NLP and IR.
We propose a novel multi-task optimization method, named EMTL, to achieve equitable MTL.
Our method stably outperforms state-of-the-art methods on the public benchmark datasets of two different research domains.
arXiv Detail & Related papers (2023-06-15T03:37:23Z) - FAMO: Fast Adaptive Multitask Optimization [48.59232177073481]
We introduce Fast Adaptive Multitask Optimization FAMO, a dynamic weighting method that decreases task losses in a balanced way.
Our results indicate that FAMO achieves comparable or superior performance to state-of-the-art gradient manipulation techniques.
arXiv Detail & Related papers (2023-06-06T15:39:54Z) - MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio
for Multi-Task Learning [19.38778317110205]
In computer vision, Multi-Task Learning (MTL) can outperform Single-Task Learning (STL)
In the MTL scenario, Inter-Task Gradient Noise (ITGN) is an additional source of gradient noise for each task.
We design a MaxGNR algorithm to alleviate ITGN interference of each task.
arXiv Detail & Related papers (2023-02-18T14:50:45Z) - M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
Learning with Model-Accelerator Co-design [95.41238363769892]
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly.
Current MTL regimes have to activate nearly the entire model even to just execute a single task.
We present a model-accelerator co-design framework to enable efficient on-device MTL.
arXiv Detail & Related papers (2022-10-26T15:40:24Z) - Multi-Task Learning as a Bargaining Game [63.49888996291245]
In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks.
Since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts.
We propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update.
arXiv Detail & Related papers (2022-02-02T13:21:53Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task
Learning [0.0]
Multi-task learning (MTL) is a subfield of machine learning with important applications.
The best MTL optimization methods require individually computing the gradient of each task's loss function.
We propose Scaled Loss Approximate Weighting (SLAW), a method for multi-task optimization that matches the performance of the best existing methods while being much more efficient.
arXiv Detail & Related papers (2021-09-16T20:58:40Z) - HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with
Auxiliary Tasks [4.095907708855597]
Multi-task learning (MTL) can improve performance on a task by sharing representations with one or more related auxiliary-tasks.
Usually, MTL-networks are trained on a composite loss function formed by a constant weighted combination of the separate task losses.
In practice, constant loss weights lead to poor results for two reasons: (i) for mini-batch based optimisation, the optimal task weights vary significantly from one update to the next depending on mini-batch sample composition.
We introduce HydaLearn, an intelligent weighting algorithm that connects main-task gain to the individual task gradients, in order to inform
arXiv Detail & Related papers (2020-08-26T16:04:02Z) - Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks.
The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood.
We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.