Related papers: Improved Group Robustness via Classifier Retraining on Independent Splits

Improved Group Robustness via Classifier Retraining on Independent Splits

URL: http://arxiv.org/abs/2204.09583v3
Date: Fri, 28 Jul 2023 18:59:31 GMT
Title: Improved Group Robustness via Classifier Retraining on Independent Splits
Authors: Thien Hang Nguyen, Hongyang R. Zhang, Huy Le Nguyen
Abstract summary: Group distributionally robust optimization is a widely used baseline for learning models with strong worst-group performance. This paper designs a simple method based on the idea of retraining on independent splits of the training data. We find that using a novel sample-splitting procedure achieves robust worst-group performance in the fine-tuning step.
Score: 6.930560177764658
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep neural networks trained by minimizing the average risk can achieve strong average performance. Still, their performance for a subgroup may degrade if the subgroup is underrepresented in the overall data population. Group distributionally robust optimization (Sagawa et al., 2020a), or group DRO in short, is a widely used baseline for learning models with strong worst-group performance. We note that this method requires group labels for every example at training time and can overfit to small groups, requiring strong regularization. Given a limited amount of group labels at training time, Just Train Twice (Liu et al., 2021), or JTT in short, is a two-stage method that infers a pseudo group label for every unlabeled example first, then applies group DRO based on the inferred group labels. The inference process is also sensitive to overfitting, sometimes involving additional hyperparameters. This paper designs a simple method based on the idea of classifier retraining on independent splits of the training data. We find that using a novel sample-splitting procedure achieves robust worst-group performance in the fine-tuning step. When evaluated on benchmark image and text classification tasks, our approach consistently performs favorably to group DRO, JTT, and other strong baselines when either group labels are available during training or are only given in validation sets. Importantly, our method only relies on a single hyperparameter, which adjusts the fraction of labels used for training feature extractors vs. training classification layers. We justify the rationale of our splitting scheme with a generalization-bound analysis of the worst-group loss.

Related papers

Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness [53.96714099151378]
We propose a three-step approach for parameter-efficient fine-tuning of image-text foundation models. Our method improves its two key components: minority samples identification and the robust training algorithm. Our theoretical analysis shows that our PPA enhances minority group identification and is Bayes optimal for minimizing the balanced group error.
arXiv Detail & Related papers (2025-03-12T15:46:12Z)
Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions [37.0753553356624]
We introduce Group-robust Sample Reweighting (GSR), a two-stage approach that first learns the representations from group-unlabeled data. GSR is theoretically sound, practically lightweight, and effective in improving the robustness to subpopulation shifts.
arXiv Detail & Related papers (2025-03-10T13:34:18Z)
Efficient Bias Mitigation Without Privileged Information [14.21628601482357]
Deep neural networks trained via empirical risk minimisation often exhibit significant performance disparities across groups. Existing bias mitigation methods that aim to address this issue often rely on group labels for training or validation. We propose Targeted Augmentations for Bias Mitigation (TAB), a framework that leverages the entire training history of a helper model to identify spurious samples. We show that TAB improves worst-group performance without any group information or model selection, outperforming existing methods while maintaining overall accuracy.
arXiv Detail & Related papers (2024-09-26T09:56:13Z)
A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups. We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z)
Bias Amplification Enhances Minority Group Performance [10.380812738348899]
We propose BAM, a novel two-stage training algorithm. In the first stage, the model is trained using a bias amplification scheme via introducing a learnable auxiliary variable for each training sample. In the second stage, we upweight the samples that the bias-amplified model misclassifies, and then continue training the same model on the reweighted dataset.
arXiv Detail & Related papers (2023-09-13T04:40:08Z)
Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group Shifts [122.08782633878788]
Some robust training algorithms (e.g., Group DRO) specialize to group shifts and require group information on all training points. Other methods (e.g., CVaR DRO) that do not need group annotations can be overly conservative. We learn a model that maintains high accuracy on simple group functions realized by low features.
arXiv Detail & Related papers (2023-02-06T17:07:16Z)
Outlier-Robust Group Inference via Gradient Space Clustering [50.87474101594732]
Existing methods can improve the worst-group performance, but they require group annotations, which are often expensive and sometimes infeasible to obtain. We address the problem of learning group annotations in the presence of outliers by clustering the data in the space of gradients of the model parameters. We show that data in the gradient space has a simpler structure while preserving information about minority groups and outliers, making it suitable for standard clustering methods like DBSCAN.
arXiv Detail & Related papers (2022-10-13T06:04:43Z)
Take One Gram of Neural Features, Get Enhanced Group Robustness [23.541213868620837]
Predictive performance of machine learning models trained with empirical risk minimization can degrade considerably under distribution shifts. We propose to partition the training dataset into groups based on Gram matrices of features extracted by an identification'' model. Our approach not only improves group robustness over ERM but also outperforms all recent baselines.
arXiv Detail & Related papers (2022-08-26T12:34:55Z)
Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions. We propose an algorithm that optimize for the worst-off group assignments from a constraint set. We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z)
BARACK: Partially Supervised Group Robustness With Guarantees [29.427365308680717]
We propose BARACK, a framework to improve worst-group performance on neural networks. We train a model to predict the missing group labels for the training data, and then use these predicted group labels in a robust optimization objective. Empirically, our method outperforms the baselines that do not use group information, even when only 1-33% of points have group labels.
arXiv Detail & Related papers (2021-12-31T23:05:21Z)
Just Train Twice: Improving Group Robustness without Training Group Information [101.84574184298006]
Standard training via empirical risk minimization can produce models that achieve high accuracy on average but low accuracy on certain groups. Prior approaches that achieve high worst-group accuracy, like group distributionally robust optimization (group DRO) require expensive group annotations for each training point. We propose a simple two-stage approach, JTT, that first trains a standard ERM model for several epochs, and then trains a second model that upweights the training examples that the first model misclassified.
arXiv Detail & Related papers (2021-07-19T17:52:32Z)
Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics. We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data. Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.