Last Layer Re-Training is Sufficient for Robustness to Spurious
Correlations
- URL: http://arxiv.org/abs/2204.02937v2
- Date: Fri, 30 Jun 2023 22:51:42 GMT
- Title: Last Layer Re-Training is Sufficient for Robustness to Spurious
Correlations
- Authors: Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson
- Abstract summary: We show that last layer retraining can match or outperform state-of-the-art approaches on spurious correlation benchmarks.
We also show that last layer retraining on large ImageNet-trained models can significantly reduce reliance on background and texture information.
- Score: 51.552870594221865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network classifiers can largely rely on simple spurious features, such
as backgrounds, to make predictions. However, even in these cases, we show that
they still often learn core features associated with the desired attributes of
the data, contrary to recent findings. Inspired by this insight, we demonstrate
that simple last layer retraining can match or outperform state-of-the-art
approaches on spurious correlation benchmarks, but with profoundly lower
complexity and computational expenses. Moreover, we show that last layer
retraining on large ImageNet-trained models can also significantly reduce
reliance on background and texture information, improving robustness to
covariate shift, after only minutes of training on a single GPU.
Related papers
- Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers [20.25049261035324]
We extend the analysis to two-layer ReLU convolutional neural networks (CNNs) with fully trainable layers.
Our results show that the scaling of the output layer is crucial to the training dynamics.
In both settings, we provide nearly matching upper and lower bounds on the test errors.
arXiv Detail & Related papers (2024-10-24T20:15:45Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free
Ensembles of DNNs [9.010643838773477]
We introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data.
We show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated.
We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement without any additional cost in training time.
arXiv Detail & Related papers (2023-10-17T09:22:22Z) - Improving Out-of-Distribution Generalization of Neural Rerankers with
Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score.
We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Balanced Softmax Cross-Entropy for Incremental Learning [6.5423218639215275]
Deep neural networks are prone to catastrophic forgetting when incrementally trained on new classes or new tasks.
Recent methods has proven to be effective to mitigate catastrophic forgetting.
We propose the use of the Balanced Softmax Cross-Entropy loss and show that it can be combined with exiting methods for incremental learning to improve their performances.
arXiv Detail & Related papers (2021-03-23T13:30:26Z) - The Little W-Net That Could: State-of-the-Art Retinal Vessel
Segmentation with Minimalistic Models [19.089445797922316]
We show that a minimalistic version of a standard U-Net with several orders of magnitude less parameters closely approximates the performance of current best techniques.
We also propose a simple extension, dubbed W-Net, which reaches outstanding performance on several popular datasets.
We also test our approach on the Artery/Vein segmentation problem, where we again achieve results well-aligned with the state-of-the-art.
arXiv Detail & Related papers (2020-09-03T19:59:51Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z) - Adversarially-Trained Deep Nets Transfer Better: Illustration on Image
Classification [53.735029033681435]
Transfer learning is a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains.
In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models.
arXiv Detail & Related papers (2020-07-11T22:48:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.