Neighborhood Region Smoothing Regularization for Finding Flat Minima In
Deep Neural Networks
- URL: http://arxiv.org/abs/2201.06064v1
- Date: Sun, 16 Jan 2022 15:11:00 GMT
- Title: Neighborhood Region Smoothing Regularization for Finding Flat Minima In
Deep Neural Networks
- Authors: Yang Zhao and Hao Zhang
- Abstract summary: We propose an effective regularization technique, called Neighborhood Region Smoothing (NRS)
NRS tries to regularize the neighborhood region in weight space to yield approximate outputs.
We empirically show that the minima found by NRS would have relatively smaller Hessian eigenvalues compared to the conventional method.
- Score: 16.4654807047138
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to diverse architectures in deep neural networks (DNNs) with severe
overparameterization, regularization techniques are critical for finding
optimal solutions in the huge hypothesis space. In this paper, we propose an
effective regularization technique, called Neighborhood Region Smoothing (NRS).
NRS leverages the finding that models would benefit from converging to flat
minima, and tries to regularize the neighborhood region in weight space to
yield approximate outputs. Specifically, gap between outputs of models in the
neighborhood region is gauged by a defined metric based on Kullback-Leibler
divergence. This metric provides similar insights with the minimum description
length principle on interpreting flat minima. By minimizing both this
divergence and empirical loss, NRS could explicitly drive the optimizer towards
converging to flat minima. We confirm the effectiveness of NRS by performing
image classification tasks across a wide range of model architectures on
commonly-used datasets such as CIFAR and ImageNet, where generalization ability
could be universally improved. Also, we empirically show that the minima found
by NRS would have relatively smaller Hessian eigenvalues compared to the
conventional method, which is considered as the evidence of flat minima.
Related papers
- Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Typical and atypical solutions in non-convex neural networks with
discrete and continuous weights [2.7127628066830414]
We study the binary and continuous negative-margin perceptrons as simple non-constrained network models learning random rules and associations.
Both models exhibit subdominant minimizers which are extremely flat and wide.
For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers.
arXiv Detail & Related papers (2023-04-26T23:34:40Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - On generalization bounds for deep networks based on loss surface
implicit regularization [5.68558935178946]
Modern deep neural networks generalize well despite a large number of parameters.
That modern deep neural networks generalize well despite a large number of parameters contradicts the classical statistical learning theory.
arXiv Detail & Related papers (2022-01-12T16:41:34Z) - Probabilistic partition of unity networks: clustering based deep
approximation [0.0]
Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs.
We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based generalizations of a maximum likelihood loss.
We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space.
arXiv Detail & Related papers (2021-07-07T08:02:00Z) - Unveiling the structure of wide flat minima in neural networks [0.46664938579243564]
Deep learning has revealed the application potential of networks across the sciences.
The success of deep learning has revealed the application potential of networks across the sciences.
arXiv Detail & Related papers (2021-07-02T16:04:57Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - Effective Version Space Reduction for Convolutional Neural Networks [61.84773892603885]
In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis.
We examine active learning with convolutional neural networks through the principled lens of version space reduction.
arXiv Detail & Related papers (2020-06-22T17:40:03Z) - Entropic gradient descent algorithms and wide flat minima [6.485776570966397]
We show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide flat regions.
We extend the analysis to the deep learning scenario by extensive numerical validations.
An easy to compute flatness measure shows a clear correlation with test accuracy.
arXiv Detail & Related papers (2020-06-14T13:22:19Z) - Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable
Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows.
We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.