A Modern Look at the Relationship between Sharpness and Generalization
- URL: http://arxiv.org/abs/2302.07011v2
- Date: Wed, 7 Jun 2023 09:03:01 GMT
- Title: A Modern Look at the Relationship between Sharpness and Generalization
- Authors: Maksym Andriushchenko, Francesco Croce, Maximilian M\"uller, Matthias
Hein, Nicolas Flammarion
- Abstract summary: Sharpness of minima is promising quantity that can correlate with generalization in deep networks.
Sharpness is not invariant under reparametrizations of neural networks.
We show that sharpness does not correlate well with generalization.
- Score: 64.03012884804458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sharpness of minima is a promising quantity that can correlate with
generalization in deep networks and, when optimized during training, can
improve generalization. However, standard sharpness is not invariant under
reparametrizations of neural networks, and, to fix this,
reparametrization-invariant sharpness definitions have been proposed, most
prominently adaptive sharpness (Kwon et al., 2021). But does it really capture
generalization in modern practical settings? We comprehensively explore this
question in a detailed study of various definitions of adaptive sharpness in
settings ranging from training from scratch on ImageNet and CIFAR-10 to
fine-tuning CLIP on ImageNet and BERT on MNLI. We focus mostly on transformers
for which little is known in terms of sharpness despite their widespread usage.
Overall, we observe that sharpness does not correlate well with generalization
but rather with some training parameters like the learning rate that can be
positively or negatively correlated with generalization depending on the setup.
Interestingly, in multiple cases, we observe a consistent negative correlation
of sharpness with out-of-distribution error implying that sharper minima can
generalize better. Finally, we illustrate on a simple model that the right
sharpness measure is highly data-dependent, and that we do not understand well
this aspect for realistic data distributions. The code of our experiments is
available at https://github.com/tml-epfl/sharpness-vs-generalization.
Related papers
- Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization [52.16435732772263]
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications.
However, generalization properties of second-order methods are still being debated.
We show for the first time that exact Gauss-Newton (GN) updates take on a tractable form in a class of deep architectures.
arXiv Detail & Related papers (2024-11-12T17:58:40Z) - Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free
Ensembles of DNNs [9.010643838773477]
We introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data.
We show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated.
We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement without any additional cost in training time.
arXiv Detail & Related papers (2023-10-17T09:22:22Z) - FAM: Relative Flatness Aware Minimization [5.132856559837775]
optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber.
Recent theoretical work suggests that a particular relative flatness measure can be connected to generalization.
We derive a regularizer based on this relative flatness that is easy to compute, fast, efficient, and works with arbitrary loss functions.
arXiv Detail & Related papers (2023-07-05T14:48:24Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning
of Deep Neural Networks [2.8292841621378844]
We introduce the concept of adaptive sharpness which is scale-invariant and propose the corresponding generalization bound.
We suggest a novel learning method, adaptive sharpness-aware minimization (ASAM), utilizing the proposed generalization bound.
Experimental results in various benchmark datasets show that ASAM contributes to significant improvement of model generalization performance.
arXiv Detail & Related papers (2021-02-23T10:26:54Z) - BN-invariant sharpness regularizes the training model to better
generalization [72.97766238317081]
We propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN.
We use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective.
arXiv Detail & Related papers (2021-01-08T10:23:24Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Calibrating Deep Neural Networks using Focal Loss [77.92765139898906]
Miscalibration is a mismatch between a model's confidence and its correctness.
We show that focal loss allows us to learn models that are already very well calibrated.
We show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases.
arXiv Detail & Related papers (2020-02-21T17:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.