Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
- URL: http://arxiv.org/abs/2510.07758v2
- Date: Thu, 16 Oct 2025 14:21:40 GMT
- Title: Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
- Authors: Qiaozhe Zhang, Jun Sun, Ruijie Zhang, Yingzhuang Liu,
- Abstract summary: We propose a novel sharpness measure, i.e., textitR'enyi sharpness, which is defined as the negative R'enyi entropy (a generalization of the classical Shannon entropy) of the loss Hessian.<n>To rigorously establish the relationship between generalization and (R'enyi) sharpness, we provide several generalization bounds in terms of R'enyi sharpness.<n>Experiments are conducted to verify the strong correlation (in specific, Kendall rank correlation) between the R'enyi sharpness and generalization.
- Score: 7.429398847018864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sharpness (of the loss minima) is a common measure to investigate the generalization of neural networks. Intuitively speaking, the flatter the landscape near the minima is, the better generalization might be. Unfortunately, the correlation between many existing sharpness measures and the generalization is usually not strong, sometimes even weak. To close the gap between the intuition and the reality, we propose a novel sharpness measure, i.e., \textit{R\'enyi sharpness}, which is defined as the negative R\'enyi entropy (a generalization of the classical Shannon entropy) of the loss Hessian. The main ideas are as follows: 1) we realize that \textit{uniform} (identical) eigenvalues of the loss Hessian is most desirable (while keeping the sum constant) to achieve good generalization; 2) we employ the \textit{R\'enyi entropy} to concisely characterize the extent of the spread of the eigenvalues of loss Hessian. Normally, the larger the spread, the smaller the (R\'enyi) entropy. To rigorously establish the relationship between generalization and (R\'enyi) sharpness, we provide several generalization bounds in terms of R\'enyi sharpness, by taking advantage of the reparametrization invariance property of R\'enyi sharpness, as well as the trick of translating the data discrepancy to the weight perturbation. Furthermore, extensive experiments are conducted to verify the strong correlation (in specific, Kendall rank correlation) between the R\'enyi sharpness and generalization. Moreover, we propose to use a variant of R\'enyi Sharpness as regularizer during training, i.e., R\'enyi Sharpness Aware Minimization (RSAM), which turns out to outperform all existing sharpness-aware minimization methods. It is worthy noting that the test accuracy gain of our proposed RSAM method could be as high as nearly 2.5\%, compared against the classical SAM method.
Related papers
- Flatness After All? [6.977444416330261]
We argue that generalization could be assessed by measuring flatness using a soft rank measure of the Hessian.<n>For non-calibrated models, we connect a soft rank based flatness measure to the well-known Takeuchi Information Criterion.
arXiv Detail & Related papers (2025-06-21T20:33:36Z) - Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization [12.58055746943097]
We argue that a comprehensive understanding of the generalization performance of gradient descent requires analyzing the interaction between these various forms of implicit regularization.<n>We prove for diagonal linear networks trained on a simple regression task that neither implicit bias alone minimizes the generalization error.
arXiv Detail & Related papers (2025-05-27T16:51:06Z) - A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions.
We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - A Modern Look at the Relationship between Sharpness and Generalization [64.03012884804458]
Sharpness of minima is promising quantity that can correlate with generalization in deep networks.
Sharpness is not invariant under reparametrizations of neural networks.
We show that sharpness does not correlate well with generalization.
arXiv Detail & Related papers (2023-02-14T12:38:12Z) - Surrogate Gap Minimization Improves Sharpness-Aware Training [52.58252223573646]
Surrogate textbfGap Guided textbfSharpness-textbfAware textbfMinimization (GSAM) is a novel improvement over Sharpness-Aware Minimization (SAM) with negligible computation overhead.
GSAM seeks a region with both small loss (by step 1) and low sharpness (by step 2), giving rise to a model with high generalization capabilities.
arXiv Detail & Related papers (2022-03-15T16:57:59Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - BN-invariant sharpness regularizes the training model to better
generalization [72.97766238317081]
We propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN.
We use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective.
arXiv Detail & Related papers (2021-01-08T10:23:24Z) - Flatness is a False Friend [0.7614628596146599]
Hessian based measures of flatness have been argued, used and shown to relate to generalisation.
We show that for feed forward neural networks under the cross entropy loss, we would expect low loss solutions with large weights to have small Hessian based measures of flatness.
arXiv Detail & Related papers (2020-06-16T11:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.