Regularizing Deep Neural Networks with Stochastic Estimators of Hessian
Trace
- URL: http://arxiv.org/abs/2208.05924v1
- Date: Thu, 11 Aug 2022 16:51:27 GMT
- Title: Regularizing Deep Neural Networks with Stochastic Estimators of Hessian
Trace
- Authors: Yucong Liu and Shixing Yu and Tong Lin
- Abstract summary: We develop a novel regularization method for deep neural networks by penalizing the trace of Hessian.
Experiments demonstrate that our method outperforms existing regularizers and data augmentation methods.
- Score: 1.933681537640272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we develop a novel regularization method for deep neural
networks by penalizing the trace of Hessian. This regularizer is motivated by a
recent guarantee bound of the generalization error. Hutchinson method is a
classical unbiased estimator for the trace of a matrix, but it is very
time-consuming on deep learning models. Hence a dropout scheme is proposed to
efficiently implements the Hutchinson method. Then we discuss a connection to
linear stability of a nonlinear dynamical system and flat/sharp minima.
Experiments demonstrate that our method outperforms existing regularizers and
data augmentation methods, such as Jacobian, confidence penalty, and label
smoothing, cutout and mixup.
Related papers
- Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks [0.5827521884806072]
Large neural networks trained on large datasets have become the dominant paradigm in machine learning.
This thesis develops scalable methods to equip neural networks with model uncertainty.
arXiv Detail & Related papers (2024-04-29T23:38:58Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Hyper-Reduced Autoencoders for Efficient and Accurate Nonlinear Model
Reductions [1.0499611180329804]
Projection-based model order reduction has been recently proposed for problems with slowly decaying Kolmogorov n-width.
A disadvantage of the previously proposed methods is the potential high computational costs of training the networks on high-fidelity solution snapshots.
We propose and analyze a novel method that overcomes this disadvantage by training a neural network only on subsampled versions of the high-fidelity solution snapshots.
arXiv Detail & Related papers (2023-03-16T20:18:33Z) - DeepBayes -- an estimator for parameter estimation in stochastic
nonlinear dynamical models [11.917949887615567]
We propose DeepBayes estimators that leverage the power of deep recurrent neural networks in learning an estimator.
The deep recurrent neural network architectures can be trained offline and ensure significant time savings during inference.
We demonstrate the applicability of our proposed method on different example models and perform detailed comparisons with state-of-the-art approaches.
arXiv Detail & Related papers (2022-05-04T18:12:17Z) - On the adaptation of recurrent neural networks for system identification [2.5234156040689237]
This paper presents a transfer learning approach which enables fast and efficient adaptation of Recurrent Neural Network (RNN) models of dynamical systems.
The system dynamics are then assumed to change, leading to an unacceptable degradation of the nominal model performance on the perturbed system.
To cope with the mismatch, the model is augmented with an additive correction term trained on fresh data from the new dynamic regime.
arXiv Detail & Related papers (2022-01-21T12:04:17Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - DL-Reg: A Deep Learning Regularization Technique using Linear Regression [4.1359299555083595]
This paper proposes a novel deep learning regularization method named as DL-Reg.
It carefully reduces the nonlinearity of deep networks to a certain extent by explicitly enforcing the network to behave as much linear as possible.
The performance of DL-Reg is evaluated by training state-of-the-art deep network models on several benchmark datasets.
arXiv Detail & Related papers (2020-10-31T21:53:24Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.