Related papers: Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis

Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis

URL: http://arxiv.org/abs/2509.23915v1
Date: Sun, 28 Sep 2025 14:40:06 GMT
Title: Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis
Authors: Yihang Guo, Tianyuan Yu, Liang Bai, Yanming Guo, Yirun Ruan, William Li, Weishi Zheng,
Abstract summary: Multi-task learning (MTL) aims to build general-purpose vision systems by training a single network to perform multiple tasks jointly.<n>While promising, its potential is often hindered by "unbalanced optimization"<n>This paper presents a systematic experimental analysis to dissect the factors contributing to this persistent problem.
Score: 44.410446932443
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-task learning (MTL) aims to build general-purpose vision systems by training a single network to perform multiple tasks jointly. While promising, its potential is often hindered by "unbalanced optimization", where task interference leads to subpar performance compared to single-task models. To facilitate research in MTL, this paper presents a systematic experimental analysis to dissect the factors contributing to this persistent problem. Our investigation confirms that the performance of existing optimization methods varies inconsistently across datasets, and advanced architectures still rely on costly grid-searched loss weights. Furthermore, we show that while powerful Vision Foundation Models (VFMs) provide strong initialization, they do not inherently resolve the optimization imbalance, and merely increasing data quantity offers limited benefits. A crucial finding emerges from our analysis: a strong correlation exists between the optimization imbalance and the norm of task-specific gradients. We demonstrate that this insight is directly applicable, showing that a straightforward strategy of scaling task losses according to their gradient norms can achieve performance comparable to that of an extensive and computationally expensive grid search. Our comprehensive analysis suggests that understanding and controlling gradient dynamics is a more direct path to stable MTL than developing increasingly complex methods.

Related papers

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z)
Injecting Imbalance Sensitivity for Multi-Task Learning [36.60453299563175]
Multi-task learning (MTL) has emerged as a promising approach for deploying deep learning models in real-life applications.<n>Recent studies have proposed optimization-based learning paradigms to establish task-shared representations in MTL.<n>Our paper empirically argues that these studies primarily emphasize the conflict issue while neglecting the potentially more significant impact of imbalance/dominance in MTL.
arXiv Detail & Related papers (2025-03-11T03:11:54Z)
Continual Optimization with Symmetry Teleportation for Multi-Task Learning [73.28772872740744]
Multi-task learning (MTL) enables the simultaneous learning of multiple tasks using a single model.<n>We propose a novel approach based on Continual Optimization with Symmetry Teleportation (COST)<n>COST seeks an alternative loss-equivalent point on the loss landscape to reduce conflict gradients.
arXiv Detail & Related papers (2025-03-06T02:58:09Z)
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.<n>We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
Analytical Uncertainty-Based Loss Weighting in Multi-Task Learning [8.493889694402478]
Key challenge in multi-task learning (MTL) is balancing individual task losses during neural network training to improve performance and efficiency. We propose a novel task-weighting method by building on the most prevalent approach of Uncertainty Weighting. Our approach yields comparable results to the analyticallyly prohibitive, brute-force approach of Scalarization.
arXiv Detail & Related papers (2024-08-15T07:10:17Z)
Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs. We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z)
Fair Resource Allocation in Multi-Task Learning [12.776767874217663]
Multi-task learning (MTL) can leverage the shared knowledge across tasks, resulting in improved data efficiency and generalization performance. A major challenge in MTL lies in the presence of conflicting gradients, which can hinder the fair optimization of some tasks. Inspired by fair resource allocation in communication networks, we propose FairGrad, a novel MTL optimization method.
arXiv Detail & Related papers (2024-02-23T22:46:14Z)
Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement [69.51496713076253]
In this paper, we focus on the aforementioned efficiency aspects of existing MTL methods. We first carry out large-scale experiments of the methods with smaller backbones and on a the MetaGraspNet dataset as a new test ground. We also propose Feature Disentanglement measure as a novel and efficient identifier of the challenges in MTL.
arXiv Detail & Related papers (2024-02-05T22:15:55Z)
Large Language Models are Miscalibrated In-Context Learners [22.30783674111999]
In this work, we deliver an in-depth analysis of the behavior across different choices of learning methods.<n>We observe that the miscalibration problem exists across all learning methods in low-resource setups.<n>We find that self-ensembling with max probability produces robust and calibrated predictions.
arXiv Detail & Related papers (2023-12-21T11:55:10Z)
Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective [61.10883077161432]
Multi-task learning (MTL) trains deep neural networks to optimize several objectives simultaneously using a shared backbone.<n>We introduce a novel MTL framework that leverages weight perturbation to regulate gradient norms, thus improving generalization.<n>Our method significantly outperforms existing gradient-based MTL techniques in terms of task performance and overall model robustness.
arXiv Detail & Related papers (2022-11-24T17:19:30Z)
SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task Learning [0.0]
Multi-task learning (MTL) is a subfield of machine learning with important applications. The best MTL optimization methods require individually computing the gradient of each task's loss function. We propose Scaled Loss Approximate Weighting (SLAW), a method for multi-task optimization that matches the performance of the best existing methods while being much more efficient.
arXiv Detail & Related papers (2021-09-16T20:58:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.