BN-invariant sharpness regularizes the training model to better
generalization
- URL: http://arxiv.org/abs/2101.02944v1
- Date: Fri, 8 Jan 2021 10:23:24 GMT
- Title: BN-invariant sharpness regularizes the training model to better
generalization
- Authors: Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
- Abstract summary: We propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN.
We use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective.
- Score: 72.97766238317081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is arguably believed that flatter minima can generalize better. However,
it has been pointed out that the usual definitions of sharpness, which consider
either the maxima or the integral of loss over a $\delta$ ball of parameters
around minima, cannot give consistent measurement for scale invariant neural
networks, e.g., networks with batch normalization layer. In this paper, we
first propose a measure of sharpness, BN-Sharpness, which gives consistent
value for equivalent networks under BN. It achieves the property of scale
invariance by connecting the integral diameter with the scale of parameter.
Then we present a computation-efficient way to calculate the BN-sharpness
approximately i.e., one dimensional integral along the "sharpest" direction.
Furthermore, we use the BN-sharpness to regularize the training and design an
algorithm to minimize the new regularized objective. Our algorithm achieves
considerably better performance than vanilla SGD over various experiment
settings.
Related papers
- A Modern Look at the Relationship between Sharpness and Generalization [64.03012884804458]
Sharpness of minima is promising quantity that can correlate with generalization in deep networks.
Sharpness is not invariant under reparametrizations of neural networks.
We show that sharpness does not correlate well with generalization.
arXiv Detail & Related papers (2023-02-14T12:38:12Z) - Efficient Generalization Improvement Guided by Random Weight
Perturbation [24.027159739234524]
Gruesome-aware minimization (SAM) establishes a generic scheme for generalization improvements.
We resort to filter-wise random weight perturbations (RWP) to decouple the nested gradients in SAM.
We achieve very competitive performance on CIFAR and remarkably better performance on ImageNet.
arXiv Detail & Related papers (2022-11-21T14:24:34Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - Revisiting Batch Normalization [0.0]
Batch normalization (BN) is essential for training deep neural networks.
We revisit the BN formulation and present a new method and update approach for BN to address the aforementioned issues.
Experimental results using the proposed alterations to BN show statistically significant performance gains in a variety of scenarios.
We also present a new online BN-based input data normalization technique to alleviate the need for other offline or fixed methods.
arXiv Detail & Related papers (2021-10-26T19:48:19Z) - MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z) - Holistic Filter Pruning for Efficient Deep Neural Networks [25.328005340524825]
"Holistic Filter Pruning" (HFP) is a novel approach for common DNN training that is easy to implement and enables to specify accurate pruning rates.
In various experiments, we give insights into the training and achieve state-of-the-art performance on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2020-09-17T09:23:36Z) - Entropic gradient descent algorithms and wide flat minima [6.485776570966397]
We show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide flat regions.
We extend the analysis to the deep learning scenario by extensive numerical validations.
An easy to compute flatness measure shows a clear correlation with test accuracy.
arXiv Detail & Related papers (2020-06-14T13:22:19Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.