Improvable Gap Balancing for Multi-Task Learning
- URL: http://arxiv.org/abs/2307.15429v1
- Date: Fri, 28 Jul 2023 09:26:03 GMT
- Title: Improvable Gap Balancing for Multi-Task Learning
- Authors: Yanqi Dai, Nanyi Fei, Zhiwu Lu
- Abstract summary: We propose two novel improvable gap balancing (IGB) algorithms for multi-task learning (MTL)
One takes a simple, and the other (for the first time) deploys deep reinforcement learning for MTL.
Our IGB algorithms lead to the best results in MTL via loss balancing and achieve further improvements when combined with gradient balancing.
- Score: 15.582333026781004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-task learning (MTL), gradient balancing has recently attracted more
research interest than loss balancing since it often leads to better
performance. However, loss balancing is much more efficient than gradient
balancing, and thus it is still worth further exploration in MTL. Note that
prior studies typically ignore that there exist varying improvable gaps across
multiple tasks, where the improvable gap per task is defined as the distance
between the current training progress and desired final training progress.
Therefore, after loss balancing, the performance imbalance still arises in many
cases. In this paper, following the loss balancing framework, we propose two
novel improvable gap balancing (IGB) algorithms for MTL: one takes a simple
heuristic, and the other (for the first time) deploys deep reinforcement
learning for MTL. Particularly, instead of directly balancing the losses in
MTL, both algorithms choose to dynamically assign task weights for improvable
gap balancing. Moreover, we combine IGB and gradient balancing to show the
complementarity between the two types of algorithms. Extensive experiments on
two benchmark datasets demonstrate that our IGB algorithms lead to the best
results in MTL via loss balancing and achieve further improvements when
combined with gradient balancing. Code is available at
https://github.com/YanqiDai/IGB4MTL.
Related papers
- Scalable Bilevel Loss Balancing for Multi-Task Learning [30.689399230097667]
Multi-task learning (MTL) has been widely adopted for its ability to simultaneously learn multiple tasks.
We propose BiLB4MTL, a simple and scalable loss balancing approach for MTL.
BiLB4MTL achieves state-of-the-art performance in both accuracy and efficiency.
arXiv Detail & Related papers (2025-02-12T17:18:14Z) - Fair Resource Allocation in Multi-Task Learning [12.776767874217663]
Multi-task learning (MTL) can leverage the shared knowledge across tasks, resulting in improved data efficiency and generalization performance.
A major challenge in MTL lies in the presence of conflicting gradients, which can hinder the fair optimization of some tasks.
Inspired by fair resource allocation in communication networks, we propose FairGrad, a novel MTL optimization method.
arXiv Detail & Related papers (2024-02-23T22:46:14Z) - Dual-Balancing for Multi-Task Learning [42.613360970194734]
We propose a Dual-Balancing Multi-Task Learning (DB-MTL) method to alleviate the task balancing problem from both loss and gradient perspectives.
DB-MTL ensures loss-scale balancing by performing a logarithm transformation on each task loss, and guarantees gradient-magnitude balancing via normalizing all task gradients to the same magnitude as the maximum gradient norm.
arXiv Detail & Related papers (2023-08-23T09:41:28Z) - FAMO: Fast Adaptive Multitask Optimization [48.59232177073481]
We introduce Fast Adaptive Multitask Optimization FAMO, a dynamic weighting method that decreases task losses in a balanced way.
Our results indicate that FAMO achieves comparable or superior performance to state-of-the-art gradient manipulation techniques.
arXiv Detail & Related papers (2023-06-06T15:39:54Z) - Efficient Generalization Improvement Guided by Random Weight
Perturbation [24.027159739234524]
Gruesome-aware minimization (SAM) establishes a generic scheme for generalization improvements.
We resort to filter-wise random weight perturbations (RWP) to decouple the nested gradients in SAM.
We achieve very competitive performance on CIFAR and remarkably better performance on ImageNet.
arXiv Detail & Related papers (2022-11-21T14:24:34Z) - Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for
Imbalanced Learning [97.81549071978789]
We propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients.
We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-04-19T08:23:23Z) - Multi-Task Learning as a Bargaining Game [63.49888996291245]
In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks.
Since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts.
We propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update.
arXiv Detail & Related papers (2022-02-02T13:21:53Z) - SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task
Learning [0.0]
Multi-task learning (MTL) is a subfield of machine learning with important applications.
The best MTL optimization methods require individually computing the gradient of each task's loss function.
We propose Scaled Loss Approximate Weighting (SLAW), a method for multi-task optimization that matches the performance of the best existing methods while being much more efficient.
arXiv Detail & Related papers (2021-09-16T20:58:40Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.