Related papers: Multitask Learning Can Improve Worst-Group Outcomes

Multitask Learning Can Improve Worst-Group Outcomes

URL: http://arxiv.org/abs/2312.03151v2
Date: Wed, 28 Feb 2024 22:27:31 GMT
Title: Multitask Learning Can Improve Worst-Group Outcomes
Authors: Atharva Kulkarni, Lucio Dery, Amrith Setlur, Aditi Raghunathan, Ameet Talwalkar and Graham Neubig
Abstract summary: Multitask learning (MTL) is one such widely used technique. We propose to modify standard MTL by regularizing the joint multitask representation space. We find that our regularized MTL approach emphconsistently outperforms JTT on both average and worst-group outcomes.
Score: 76.92646345152788
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In order to create machine learning systems that serve a variety of users well, it is vital to not only achieve high average performance but also ensure equitable outcomes across diverse groups. However, most machine learning methods are designed to improve a model's average performance on a chosen end task without consideration for their impact on worst group error. Multitask learning (MTL) is one such widely used technique. In this paper, we seek not only to understand the impact of MTL on worst-group accuracy but also to explore its potential as a tool to address the challenge of group-wise fairness. We primarily consider the standard setting of fine-tuning a pre-trained model, where, following recent work \citep{gururangan2020don, dery2023aang}, we multitask the end task with the pre-training objective constructed from the end task data itself. In settings with few or no group annotations, we find that multitasking often, but not consistently, achieves better worst-group accuracy than Just-Train-Twice (JTT; \citet{pmlr-v139-liu21f}) -- a representative distributionally robust optimization (DRO) method. Leveraging insights from synthetic data experiments, we propose to modify standard MTL by regularizing the joint multitask representation space. We run a large number of fine-tuning experiments across computer vision and natural language processing datasets and find that our regularized MTL approach \emph{consistently} outperforms JTT on both average and worst-group outcomes. Our official code can be found here: \href{https://github.com/atharvajk98/MTL-group-robustness.git}{\url{https://github.com/atharvajk98/MTL-group-robustness}}.

Related papers

MTL-UE: Learning to Learn Nothing for Multi-Task Learning [98.42358524454731]
This paper presents MTL-UE, the first unified framework for generating unlearnable examples for multi-task data and MTL models.<n>Instead of optimizing robustness for each sample, we design a generator-based structure that introduces label priors and class-wise feature embeddings.<n>In addition, MTL-UE incorporates intra-task and inter-task embedding regularization to increase inter-class separation and suppress intra-class variance.
arXiv Detail & Related papers (2025-05-08T14:26:00Z)
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models. Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Multi-task learning via robust regularized clustering with non-convex group penalties [0.0]
Multi-task learning (MTL) aims to improve estimation performance by sharing common information among related tasks. Existing MTL methods based on this assumption often ignore outlier tasks. We propose a novel MTL method called MultiTask Regularized Clustering (MTLRRC)
arXiv Detail & Related papers (2024-04-04T07:09:43Z)
Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z)
Task Grouping for Automated Multi-Task Machine Learning via Task Affinity Prediction [7.975047833725489]
Multi-task learning (MTL) models can attain significantly higher accuracy than single-task learning (STL) models. In this paper, we propose a novel automated approach for task grouping. We identify inherent task features and STL characteristics that can help us to predict whether a group of tasks should be learned together using MTL or if they should be learned independently using STL.
arXiv Detail & Related papers (2023-10-24T23:29:46Z)
STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Map [4.263847576433289]
Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL) However, MTL is often challenging because there is an exponential number of possible task groupings. We propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping.
arXiv Detail & Related papers (2023-07-07T03:54:26Z)
When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning [15.39115079099451]
Transfer learning (TL) in natural language processing has seen a surge of interest in recent years. Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning. We compare all three TL methods in a comprehensive analysis on the GLUE dataset suite.
arXiv Detail & Related papers (2022-05-17T06:48:45Z)
Just Train Twice: Improving Group Robustness without Training Group Information [101.84574184298006]
Standard training via empirical risk minimization can produce models that achieve high accuracy on average but low accuracy on certain groups. Prior approaches that achieve high worst-group accuracy, like group distributionally robust optimization (group DRO) require expensive group annotations for each training point. We propose a simple two-stage approach, JTT, that first trains a standard ERM model for several epochs, and then trains a second model that upweights the training examples that the first model misclassified.
arXiv Detail & Related papers (2021-07-19T17:52:32Z)
Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics. We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data. Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z)
Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL) Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks. As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.