Entropic gradient descent algorithms and wide flat minima
- URL: http://arxiv.org/abs/2006.07897v4
- Date: Mon, 15 Nov 2021 22:56:17 GMT
- Title: Entropic gradient descent algorithms and wide flat minima
- Authors: Fabrizio Pittorino, Carlo Lucibello, Christoph Feinauer, Gabriele
Perugini, Carlo Baldassi, Elizaveta Demyanenko, Riccardo Zecchina
- Abstract summary: We show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide flat regions.
We extend the analysis to the deep learning scenario by extensive numerical validations.
An easy to compute flatness measure shows a clear correlation with test accuracy.
- Score: 6.485776570966397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The properties of flat minima in the empirical risk landscape of neural
networks have been debated for some time. Increasing evidence suggests they
possess better generalization capabilities with respect to sharp ones. First,
we discuss Gaussian mixture classification models and show analytically that
there exist Bayes optimal pointwise estimators which correspond to minimizers
belonging to wide flat regions. These estimators can be found by applying
maximum flatness algorithms either directly on the classifier (which is norm
independent) or on the differentiable loss function used in learning. Next, we
extend the analysis to the deep learning scenario by extensive numerical
validations. Using two algorithms, Entropy-SGD and Replicated-SGD, that
explicitly include in the optimization objective a non-local flatness measure
known as local entropy, we consistently improve the generalization error for
common architectures (e.g. ResNet, EfficientNet). An easy to compute flatness
measure shows a clear correlation with test accuracy.
Related papers
- Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - FAM: Relative Flatness Aware Minimization [5.132856559837775]
optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber.
Recent theoretical work suggests that a particular relative flatness measure can be connected to generalization.
We derive a regularizer based on this relative flatness that is easy to compute, fast, efficient, and works with arbitrary loss functions.
arXiv Detail & Related papers (2023-07-05T14:48:24Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Adaptive Self-supervision Algorithms for Physics-informed Neural
Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function.
We study the impact of the location of the collocation points on the trainability of these models.
We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z) - Neighborhood Region Smoothing Regularization for Finding Flat Minima In
Deep Neural Networks [16.4654807047138]
We propose an effective regularization technique, called Neighborhood Region Smoothing (NRS)
NRS tries to regularize the neighborhood region in weight space to yield approximate outputs.
We empirically show that the minima found by NRS would have relatively smaller Hessian eigenvalues compared to the conventional method.
arXiv Detail & Related papers (2022-01-16T15:11:00Z) - Unveiling the structure of wide flat minima in neural networks [0.46664938579243564]
Deep learning has revealed the application potential of networks across the sciences.
The success of deep learning has revealed the application potential of networks across the sciences.
arXiv Detail & Related papers (2021-07-02T16:04:57Z) - Wide flat minima and optimal generalization in classifying
high-dimensional Gaussian mixtures [8.556763944288116]
We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters.
We also consider the algorithmically relevant case of targeting wide flat minima of the mean squared error loss.
arXiv Detail & Related papers (2020-10-27T01:32:03Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.