Multitask Learning Can Improve Worst-Group Outcomes
- URL: http://arxiv.org/abs/2312.03151v2
- Date: Wed, 28 Feb 2024 22:27:31 GMT
- Title: Multitask Learning Can Improve Worst-Group Outcomes
- Authors: Atharva Kulkarni, Lucio Dery, Amrith Setlur, Aditi Raghunathan, Ameet
Talwalkar and Graham Neubig
- Abstract summary: Multitask learning (MTL) is one such widely used technique.
We propose to modify standard MTL by regularizing the joint multitask representation space.
We find that our regularized MTL approach emphconsistently outperforms JTT on both average and worst-group outcomes.
- Score: 76.92646345152788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to create machine learning systems that serve a variety of users
well, it is vital to not only achieve high average performance but also ensure
equitable outcomes across diverse groups. However, most machine learning
methods are designed to improve a model's average performance on a chosen end
task without consideration for their impact on worst group error. Multitask
learning (MTL) is one such widely used technique. In this paper, we seek not
only to understand the impact of MTL on worst-group accuracy but also to
explore its potential as a tool to address the challenge of group-wise
fairness. We primarily consider the standard setting of fine-tuning a
pre-trained model, where, following recent work \citep{gururangan2020don,
dery2023aang}, we multitask the end task with the pre-training objective
constructed from the end task data itself. In settings with few or no group
annotations, we find that multitasking often, but not consistently, achieves
better worst-group accuracy than Just-Train-Twice (JTT;
\citet{pmlr-v139-liu21f}) -- a representative distributionally robust
optimization (DRO) method. Leveraging insights from synthetic data experiments,
we propose to modify standard MTL by regularizing the joint multitask
representation space. We run a large number of fine-tuning experiments across
computer vision and natural language processing datasets and find that our
regularized MTL approach \emph{consistently} outperforms JTT on both average
and worst-group outcomes. Our official code can be found here:
\href{https://github.com/atharvajk98/MTL-group-robustness.git}{\url{https://github.com/atharvajk98/MTL-group-robustness}}.
Related papers
- MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models.
Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Multi-task learning via robust regularized clustering with non-convex group penalties [0.0]
Multi-task learning (MTL) aims to improve estimation performance by sharing common information among related tasks.
Existing MTL methods based on this assumption often ignore outlier tasks.
We propose a novel MTL method called MultiTask Regularized Clustering (MTLRRC)
arXiv Detail & Related papers (2024-04-04T07:09:43Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Task Grouping for Automated Multi-Task Machine Learning via Task
Affinity Prediction [7.975047833725489]
Multi-task learning (MTL) models can attain significantly higher accuracy than single-task learning (STL) models.
In this paper, we propose a novel automated approach for task grouping.
We identify inherent task features and STL characteristics that can help us to predict whether a group of tasks should be learned together using MTL or if they should be learned independently using STL.
arXiv Detail & Related papers (2023-10-24T23:29:46Z) - STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Map [4.263847576433289]
Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL)
However, MTL is often challenging because there is an exponential number of possible task groupings.
We propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping.
arXiv Detail & Related papers (2023-07-07T03:54:26Z) - When to Use Multi-Task Learning vs Intermediate Fine-Tuning for
Pre-Trained Encoder Transfer Learning [15.39115079099451]
Transfer learning (TL) in natural language processing has seen a surge of interest in recent years.
Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning.
We compare all three TL methods in a comprehensive analysis on the GLUE dataset suite.
arXiv Detail & Related papers (2022-05-17T06:48:45Z) - Just Train Twice: Improving Group Robustness without Training Group
Information [101.84574184298006]
Standard training via empirical risk minimization can produce models that achieve high accuracy on average but low accuracy on certain groups.
Prior approaches that achieve high worst-group accuracy, like group distributionally robust optimization (group DRO) require expensive group annotations for each training point.
We propose a simple two-stage approach, JTT, that first trains a standard ERM model for several epochs, and then trains a second model that upweights the training examples that the first model misclassified.
arXiv Detail & Related papers (2021-07-19T17:52:32Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.