You Only Need a Good Embeddings Extractor to Fix Spurious Correlations
- URL: http://arxiv.org/abs/2212.06254v1
- Date: Mon, 12 Dec 2022 21:42:33 GMT
- Title: You Only Need a Good Embeddings Extractor to Fix Spurious Correlations
- Authors: Raghav Mehta, V\'itor Albiero, Li Chen, Ivan Evtimov, Tamar Glaser,
Zhiheng Li, Tal Hassner
- Abstract summary: GroupDRO requires training a model in an end-to-end manner with subgroup labels.
We show that we can achieve up to 90% accuracy without using any sub-group information in the training set.
- Score: 26.23962870932271
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spurious correlations in training data often lead to robustness issues since
models learn to use them as shortcuts. For example, when predicting whether an
object is a cow, a model might learn to rely on its green background, so it
would do poorly on a cow on a sandy background. A standard dataset for
measuring state-of-the-art on methods mitigating this problem is Waterbirds.
The best method (Group Distributionally Robust Optimization - GroupDRO)
currently achieves 89\% worst group accuracy and standard training from scratch
on raw images only gets 72\%. GroupDRO requires training a model in an
end-to-end manner with subgroup labels. In this paper, we show that we can
achieve up to 90\% accuracy without using any sub-group information in the
training set by simply using embeddings from a large pre-trained vision model
extractor and training a linear classifier on top of it. With experiments on a
wide range of pre-trained models and pre-training datasets, we show that the
capacity of the pre-training model and the size of the pre-training dataset
matters. Our experiments reveal that high capacity vision transformers perform
better compared to high capacity convolutional neural networks, and larger
pre-training dataset leads to better worst-group accuracy on the spurious
correlation dataset.
Related papers
- Efficient Bias Mitigation Without Privileged Information [14.21628601482357]
Deep neural networks trained via empirical risk minimisation often exhibit significant performance disparities across groups.
Existing bias mitigation methods that aim to address this issue often rely on group labels for training or validation.
We propose Targeted Augmentations for Bias Mitigation (TAB), a framework that leverages the entire training history of a helper model to identify spurious samples.
We show that TAB improves worst-group performance without any group information or model selection, outperforming existing methods while maintaining overall accuracy.
arXiv Detail & Related papers (2024-09-26T09:56:13Z) - Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation? [70.38903555729081]
We describe our approach to compete in the autoPET3 datacentric track.
We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics.
We counteract this by removing the easiest samples from the training dataset as measured by the model loss before retraining from scratch.
arXiv Detail & Related papers (2024-09-20T14:47:58Z) - Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection [80.85902083005237]
We introduce Data Debiasing with Datamodels (D3M), a debiasing approach which isolates and removes specific training examples that drive the model's failures on minority groups.
arXiv Detail & Related papers (2024-06-24T17:51:01Z) - On minimizing the training set fill distance in machine learning regression [0.552480439325792]
We study a data selection approach that aims to minimize the fill distance of the selected set.
We show that selecting training sets with the FPS can also increase model stability for the specific case of Gaussian kernel regression approaches.
arXiv Detail & Related papers (2023-07-20T16:18:33Z) - Ranking & Reweighting Improves Group Distributional Robustness [14.021069321266516]
We propose a ranking-based training method called Discounted Rank Upweighting (DRU) to learn models that exhibit strong OOD performance on the test data.
Results on several synthetic and real-world datasets highlight the superior ability of our group-ranking-based (akin to soft-minimax) approach in selecting and learning models that are robust to group distributional shifts.
arXiv Detail & Related papers (2023-05-09T20:37:16Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Self-Supervised Pretraining Improves Self-Supervised Pretraining [83.1423204498361]
Self-supervised pretraining requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation.
This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model.
We show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
arXiv Detail & Related papers (2021-03-23T17:37:51Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.