Neighborhood Region Smoothing Regularization for Finding Flat Minima In
  Deep Neural Networks
        - URL: http://arxiv.org/abs/2201.06064v1
 - Date: Sun, 16 Jan 2022 15:11:00 GMT
 - Title: Neighborhood Region Smoothing Regularization for Finding Flat Minima In
  Deep Neural Networks
 - Authors: Yang Zhao and Hao Zhang
 - Abstract summary: We propose an effective regularization technique, called Neighborhood Region Smoothing (NRS)
NRS tries to regularize the neighborhood region in weight space to yield approximate outputs.
We empirically show that the minima found by NRS would have relatively smaller Hessian eigenvalues compared to the conventional method.
 - Score: 16.4654807047138
 - License: http://creativecommons.org/licenses/by/4.0/
 - Abstract:   Due to diverse architectures in deep neural networks (DNNs) with severe
overparameterization, regularization techniques are critical for finding
optimal solutions in the huge hypothesis space. In this paper, we propose an
effective regularization technique, called Neighborhood Region Smoothing (NRS).
NRS leverages the finding that models would benefit from converging to flat
minima, and tries to regularize the neighborhood region in weight space to
yield approximate outputs. Specifically, gap between outputs of models in the
neighborhood region is gauged by a defined metric based on Kullback-Leibler
divergence. This metric provides similar insights with the minimum description
length principle on interpreting flat minima. By minimizing both this
divergence and empirical loss, NRS could explicitly drive the optimizer towards
converging to flat minima. We confirm the effectiveness of NRS by performing
image classification tasks across a wide range of model architectures on
commonly-used datasets such as CIFAR and ImageNet, where generalization ability
could be universally improved. Also, we empirically show that the minima found
by NRS would have relatively smaller Hessian eigenvalues compared to the
conventional method, which is considered as the evidence of flat minima.
 
       
      
        Related papers
        - Stable Minima of ReLU Neural Networks Suffer from the Curse of   Dimensionality: The Neural Shattering Phenomenon [22.29950158991071]
We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in ReLU networks.<n>We show that while flatness does imply generalization, the resulting rates of convergence necessarily deteriorate exponentially as the input dimension grows.
arXiv  Detail & Related papers  (2025-06-25T19:10:03Z) - Geometric Neural Process Fields [58.77241763774756]
Geometric Neural Process Fields (G-NPF) is a probabilistic framework for neural radiance fields that explicitly captures uncertainty.
Building on these bases, we design a hierarchical latent variable model, allowing G-NPF to integrate structural information across multiple spatial levels.
 Experiments on novel-view synthesis for 3D scenes, as well as 2D image and 1D signal regression, demonstrate the effectiveness of our method.
arXiv  Detail & Related papers  (2025-02-04T14:17:18Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv  Detail & Related papers  (2023-05-30T19:37:44Z) - Typical and atypical solutions in non-convex neural networks with
  discrete and continuous weights [2.7127628066830414]
We study the binary and continuous negative-margin perceptrons as simple non-constrained network models learning random rules and associations.
Both models exhibit subdominant minimizers which are extremely flat and wide.
For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers.
arXiv  Detail & Related papers  (2023-04-26T23:34:40Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
  Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
 Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv  Detail & Related papers  (2022-10-25T14:45:15Z) - On generalization bounds for deep networks based on loss surface
  implicit regularization [5.68558935178946]
Modern deep neural networks generalize well despite a large number of parameters.
That modern deep neural networks generalize well despite a large number of parameters contradicts the classical statistical learning theory.
arXiv  Detail & Related papers  (2022-01-12T16:41:34Z) - Probabilistic partition of unity networks: clustering based deep
  approximation [0.0]
Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs.
We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based generalizations of a maximum likelihood loss.
We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space.
arXiv  Detail & Related papers  (2021-07-07T08:02:00Z) - Unveiling the structure of wide flat minima in neural networks [0.46664938579243564]
Deep learning has revealed the application potential of networks across the sciences.
The success of deep learning has revealed the application potential of networks across the sciences.
arXiv  Detail & Related papers  (2021-07-02T16:04:57Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv  Detail & Related papers  (2021-04-03T09:08:12Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv  Detail & Related papers  (2020-07-20T12:07:48Z) - Effective Version Space Reduction for Convolutional Neural Networks [61.84773892603885]
In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis.
We examine active learning with convolutional neural networks through the principled lens of version space reduction.
arXiv  Detail & Related papers  (2020-06-22T17:40:03Z) - Entropic gradient descent algorithms and wide flat minima [6.485776570966397]
We show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide flat regions.
We extend the analysis to the deep learning scenario by extensive numerical validations.
An easy to compute flatness measure shows a clear correlation with test accuracy.
arXiv  Detail & Related papers  (2020-06-14T13:22:19Z) - Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable
  Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows.
We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv  Detail & Related papers  (2020-03-26T22:10:04Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.