Balancing Average and Worst-case Accuracy in Multitask Learning
- URL: http://arxiv.org/abs/2110.05838v1
- Date: Tue, 12 Oct 2021 09:00:46 GMT
- Title: Balancing Average and Worst-case Accuracy in Multitask Learning
- Authors: Paul Michel and Sebastian Ruder and Dani Yogatama
- Abstract summary: We show how to use techniques from the distributionally robust optimization (DRO) literature to improve worst-case performance in multitask learning.
We present an improved method, Lookahead-DRO (L-DRO), which mitigates these issues.
Our empirical results show that L-DRO achieves a better trade-off between average and worst-case accuracy with little computational overhead.
- Score: 39.59582055620513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When training and evaluating machine learning models on a large number of
tasks, it is important to not only look at average task accuracy -- which may
be biased by easy or redundant tasks -- but also worst-case accuracy (i.e. the
performance on the task with the lowest accuracy). In this work, we show how to
use techniques from the distributionally robust optimization (DRO) literature
to improve worst-case performance in multitask learning. We highlight several
failure cases of DRO when applied off-the-shelf and present an improved method,
Lookahead-DRO (L-DRO), which mitigates these issues. The core idea of L-DRO is
to anticipate the interaction between tasks during training in order to choose
a dynamic re-weighting of the various task losses, which will (i) lead to
minimal worst-case loss and (ii) train on as many tasks as possible. After
demonstrating the efficacy of L-DRO on a small controlled synthetic setting, we
evaluate it on two realistic benchmarks: a multitask version of the CIFAR-100
image classification dataset and a large-scale multilingual language modeling
experiment. Our empirical results show that L-DRO achieves a better trade-off
between average and worst-case accuracy with little computational overhead
compared to several strong baselines.
Related papers
- Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks? [74.88417042125985]
We investigate various data-driven strategies that offer supervision data at different quality levels upon tasks of varying complexity.
We find that even when the outcome error rate for hard task supervision is high, training on such data can outperform perfectly correct supervision on easier subtasks.
Our results also reveal that supplementing hard task supervision with the corresponding subtask supervision can yield notable performance improvements.
arXiv Detail & Related papers (2024-10-27T17:55:27Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Multitask Learning Can Improve Worst-Group Outcomes [76.92646345152788]
Multitask learning (MTL) is one such widely used technique.
We propose to modify standard MTL by regularizing the joint multitask representation space.
We find that our regularized MTL approach emphconsistently outperforms JTT on both average and worst-group outcomes.
arXiv Detail & Related papers (2023-12-05T21:38:24Z) - Task Selection and Assignment for Multi-modal Multi-task Dialogue Act
Classification with Non-stationary Multi-armed Bandits [11.682678945754837]
Multi-task learning (MTL) aims to improve the performance of a primary task by jointly learning with related auxiliary tasks.
Previous studies suggest that such a random selection of tasks may not be helpful, and can even be harmful to performance.
This paper proposes a method for selecting and assigning tasks based on non-stationary multi-armed bandits.
arXiv Detail & Related papers (2023-09-18T14:51:51Z) - Identification of Negative Transfers in Multitask Learning Using
Surrogate Models [29.882265735630046]
Multitask learning is widely used to train a low-resource target task by augmenting it with multiple related source tasks.
A critical problem in multitask learning is identifying subsets of source tasks that would benefit the target task.
We introduce an efficient procedure to address this problem via surrogate modeling.
arXiv Detail & Related papers (2023-03-25T23:16:11Z) - Cross-Task Consistency Learning Framework for Multi-Task Learning [9.991706230252708]
We propose a new learning framework for 2-task MTL problem.
We define two new loss terms inspired by cycle-consistency loss and contrastive learning.
We theoretically prove that both losses help the model learn more efficiently and that cross-task consistency loss is better in terms of alignment with the straight-forward predictions.
arXiv Detail & Related papers (2021-11-28T11:55:19Z) - Diverse Distributions of Self-Supervised Tasks for Meta-Learning in NLP [39.457091182683406]
We aim to provide task distributions for meta-learning by considering self-supervised tasks automatically proposed from unlabeled text.
Our analysis shows that all these factors meaningfully alter the task distribution, some inducing significant improvements in downstream few-shot accuracy of the meta-learned models.
arXiv Detail & Related papers (2021-11-02T01:50:09Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.