Research of Damped Newton Stochastic Gradient Descent Method for Neural
Network Training
- URL: http://arxiv.org/abs/2103.16764v1
- Date: Wed, 31 Mar 2021 02:07:18 GMT
- Title: Research of Damped Newton Stochastic Gradient Descent Method for Neural
Network Training
- Authors: Jingcheng Zhou, Wei Wei, Zhiming Zheng
- Abstract summary: First-order methods like gradient descent(SGD) are recently the popular optimization method to train deep neural networks (DNNs)
In this paper, we propose the Damped Newton Descent(DN-SGD) and Gradient Descent Damped Newton(SGD-DN) methods to train DNNs for regression problems with Mean Square Error(MSE) and classification problems with Cross-Entropy Loss(CEL)
Our methods just accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the learning process much faster and more accurate than SGD.
- Score: 6.231508838034926
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: First-order methods like stochastic gradient descent(SGD) are recently the
popular optimization method to train deep neural networks (DNNs), but
second-order methods are scarcely used because of the overpriced computing cost
in getting the high-order information. In this paper, we propose the Damped
Newton Stochastic Gradient Descent(DN-SGD) method and Stochastic Gradient
Descent Damped Newton(SGD-DN) method to train DNNs for regression problems with
Mean Square Error(MSE) and classification problems with Cross-Entropy
Loss(CEL), which is inspired by a proved fact that the hessian matrix of last
layer of DNNs is always semi-definite. Different from other second-order
methods to estimate the hessian matrix of all parameters, our methods just
accurately compute a small part of the parameters, which greatly reduces the
computational cost and makes convergence of the learning process much faster
and more accurate than SGD. Several numerical experiments on real datesets are
performed to verify the effectiveness of our methods for regression and
classification problems.
Related papers
- SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix [10.532651329230497]
This paper introduces a new optimization method based on the regularized Fisher information matrix (FIM)
It can efficiently utilize the FIM to approximate the Hessian inversion matrix for finding Newton's gradient update in large-scale machine learning models.
arXiv Detail & Related papers (2024-03-05T10:09:31Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning [0.0]
We study the behaviour of quasi-Newton training algorithms for deep memory networks.
We show that quasi-Newtons are efficient and able to outperform in some instances the well-known first-order Adam run.
arXiv Detail & Related papers (2022-05-18T20:53:58Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - A Differentiable Point Process with Its Application to Spiking Neural
Networks [13.160616423673373]
Jimenez Rezende & Gerstner (2014) proposed a variational inference algorithm to train SNNs with hidden neurons.
This paper presents an alternative gradient estimator for SNNs based on the path-wise gradient estimator.
arXiv Detail & Related papers (2021-06-02T02:40:17Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural
Language Generation [79.4205462326301]
TaylorGAN is a novel approach to score function-based natural language generation.
It augments the gradient estimation by off-policy update and the first-order Taylor expansion.
It enables us to train NLG models from scratch with smaller batch size.
arXiv Detail & Related papers (2020-11-27T02:26:15Z) - A Novel Neural Network Training Framework with Data Assimilation [2.948167339160823]
A gradient-free training framework based on data assimilation is proposed to avoid the calculation of gradients.
The results show that the proposed training framework performed better than the gradient decent method.
arXiv Detail & Related papers (2020-10-06T11:12:23Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z) - Semi-Implicit Back Propagation [1.5533842336139065]
We propose a semi-implicit back propagation method for neural network training.
The difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping.
Experiments on both MNIST and CIFAR-10 demonstrate that the proposed algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy.
arXiv Detail & Related papers (2020-02-10T03:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.