Related papers: Ranking & Reweighting Improves Group Distributional Robustness

Ranking & Reweighting Improves Group Distributional Robustness

URL: http://arxiv.org/abs/2305.05759v1
Date: Tue, 9 May 2023 20:37:16 GMT
Title: Ranking & Reweighting Improves Group Distributional Robustness
Authors: Yachuan Liu, Bohan Zhang, Qiaozhu Mei and Paramveer Dhillon
Abstract summary: We propose a ranking-based training method called Discounted Rank Upweighting (DRU) to learn models that exhibit strong OOD performance on the test data. Results on several synthetic and real-world datasets highlight the superior ability of our group-ranking-based (akin to soft-minimax) approach in selecting and learning models that are robust to group distributional shifts.
Score: 14.021069321266516
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work has shown that standard training via empirical risk minimization (ERM) can produce models that achieve high accuracy on average but low accuracy on underrepresented groups due to the prevalence of spurious features. A predominant approach to tackle this group robustness problem minimizes the worst group error (akin to a minimax strategy) on the training data, hoping it will generalize well on the testing data. However, this is often suboptimal, especially when the out-of-distribution (OOD) test data contains previously unseen groups. Inspired by ideas from the information retrieval and learning-to-rank literature, this paper first proposes to use Discounted Cumulative Gain (DCG) as a metric of model quality for facilitating better hyperparameter tuning and model selection. Being a ranking-based metric, DCG weights multiple poorly-performing groups (instead of considering just the group with the worst performance). As a natural next step, we build on our results to propose a ranking-based training method called Discounted Rank Upweighting (DRU), which differentially reweights a ranked list of poorly-performing groups in the training data to learn models that exhibit strong OOD performance on the test data. Results on several synthetic and real-world datasets highlight the superior generalization ability of our group-ranking-based (akin to soft-minimax) approach in selecting and learning models that are robust to group distributional shifts.

Related papers

Trained Models Tell Us How to Make Them Robust to Spurious Correlation without Group Annotation [3.894771553698554]
Empirical Risk Minimization (ERM) models tend to rely on attributes that have high spurious correlation with the target. This can degrade the performance on underrepresented (or'minority') groups that lack these attributes. We propose Environment-based Validation and Loss-based Sampling (EVaLS) to enhance robustness to spurious correlation.
arXiv Detail & Related papers (2024-10-07T08:17:44Z)
Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection [80.85902083005237]
We introduce Data Debiasing with Datamodels (D3M), a debiasing approach which isolates and removes specific training examples that drive the model's failures on minority groups.
arXiv Detail & Related papers (2024-06-24T17:51:01Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Bias Amplification Enhances Minority Group Performance [10.380812738348899]
We propose BAM, a novel two-stage training algorithm. In the first stage, the model is trained using a bias amplification scheme via introducing a learnable auxiliary variable for each training sample. In the second stage, we upweight the samples that the bias-amplified model misclassifies, and then continue training the same model on the reweighted dataset.
arXiv Detail & Related papers (2023-09-13T04:40:08Z)
Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization [61.39201891894024]
Group distributionally robust optimization (group DRO) can minimize the worst-case loss over pre-defined groups. We reformulate the group DRO framework by proposing Q-Diversity. Characterized by an interactive training mode, Q-Diversity relaxes the group identification from annotation into direct parameterization.
arXiv Detail & Related papers (2023-05-20T07:02:27Z)
Take One Gram of Neural Features, Get Enhanced Group Robustness [23.541213868620837]
Predictive performance of machine learning models trained with empirical risk minimization can degrade considerably under distribution shifts. We propose to partition the training dataset into groups based on Gram matrices of features extracted by an identification'' model. Our approach not only improves group robustness over ERM but also outperforms all recent baselines.
arXiv Detail & Related papers (2022-08-26T12:34:55Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Just Train Twice: Improving Group Robustness without Training Group Information [101.84574184298006]
Standard training via empirical risk minimization can produce models that achieve high accuracy on average but low accuracy on certain groups. Prior approaches that achieve high worst-group accuracy, like group distributionally robust optimization (group DRO) require expensive group annotations for each training point. We propose a simple two-stage approach, JTT, that first trains a standard ERM model for several epochs, and then trains a second model that upweights the training examples that the first model misclassified.
arXiv Detail & Related papers (2021-07-19T17:52:32Z)
Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics. We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data. Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.