Related papers: Research of Damped Newton Stochastic Gradient Descent Method for Neural Network Training

Research of Damped Newton Stochastic Gradient Descent Method for Neural Network Training

URL: http://arxiv.org/abs/2103.16764v1
Date: Wed, 31 Mar 2021 02:07:18 GMT
Title: Research of Damped Newton Stochastic Gradient Descent Method for Neural Network Training
Authors: Jingcheng Zhou, Wei Wei, Zhiming Zheng
Abstract summary: First-order methods like gradient descent(SGD) are recently the popular optimization method to train deep neural networks (DNNs) In this paper, we propose the Damped Newton Descent(DN-SGD) and Gradient Descent Damped Newton(SGD-DN) methods to train DNNs for regression problems with Mean Square Error(MSE) and classification problems with Cross-Entropy Loss(CEL) Our methods just accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the learning process much faster and more accurate than SGD.
Score: 6.231508838034926
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: First-order methods like stochastic gradient descent(SGD) are recently the popular optimization method to train deep neural networks (DNNs), but second-order methods are scarcely used because of the overpriced computing cost in getting the high-order information. In this paper, we propose the Damped Newton Stochastic Gradient Descent(DN-SGD) method and Stochastic Gradient Descent Damped Newton(SGD-DN) method to train DNNs for regression problems with Mean Square Error(MSE) and classification problems with Cross-Entropy Loss(CEL), which is inspired by a proved fact that the hessian matrix of last layer of DNNs is always semi-definite. Different from other second-order methods to estimate the hessian matrix of all parameters, our methods just accurately compute a small part of the parameters, which greatly reduces the computational cost and makes convergence of the learning process much faster and more accurate than SGD. Several numerical experiments on real datesets are performed to verify the effectiveness of our methods for regression and classification problems.

Related papers

ADMM-Based Training for Spiking Neural Networks [1.1249583407496218]
spiking neural networks (SNNs) have gained momentum due to their high potential in time-series processing combined with minimal energy consumption.<n>They still lack a dedicated and efficient training algorithm.<n>We propose a novel SNN training method based on the alternating direction method of multipliers (ADMM)
arXiv Detail & Related papers (2025-05-08T10:20:33Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization [0.5120567378386615]
First-order descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning. However, these methods do not exploit curvature information. Quasi-Newton methods re-use previously computed low Hessian approximations.
arXiv Detail & Related papers (2025-02-17T20:20:11Z)
Averaged Adam accelerates stochastic optimization in the training of deep neural network approximations for partial differential equation and optimal control problems [5.052293146674794]
This work is inspired by the classical Polyak-Ruppert averaging approach. In this work we apply averaged variants of the Adam method to train deep learning networks (DNNs) In each numerical example the employed averaged variants Adam outperform the standard Adam and the standard SGDs.
arXiv Detail & Related papers (2025-01-10T16:15:25Z)
SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix [10.532651329230497]
This paper introduces a new optimization method based on the regularized Fisher information matrix (FIM) It can efficiently utilize the FIM to approximate the Hessian inversion matrix for finding Newton's gradient update in large-scale machine learning models.
arXiv Detail & Related papers (2024-03-05T10:09:31Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning [0.0]
We study the behaviour of quasi-Newton training algorithms for deep memory networks. We show that quasi-Newtons are efficient and able to outperform in some instances the well-known first-order Adam run.
arXiv Detail & Related papers (2022-05-18T20:53:58Z)
Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware. Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks. We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z)
A Differentiable Point Process with Its Application to Spiking Neural Networks [13.160616423673373]
Jimenez Rezende & Gerstner (2014) proposed a variational inference algorithm to train SNNs with hidden neurons. This paper presents an alternative gradient estimator for SNNs based on the path-wise gradient estimator.
arXiv Detail & Related papers (2021-06-02T02:40:17Z)
Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z)
TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation [79.4205462326301]
TaylorGAN is a novel approach to score function-based natural language generation. It augments the gradient estimation by off-policy update and the first-order Taylor expansion. It enables us to train NLG models from scratch with smaller batch size.
arXiv Detail & Related papers (2020-11-27T02:26:15Z)
A Novel Neural Network Training Framework with Data Assimilation [2.948167339160823]
A gradient-free training framework based on data assimilation is proposed to avoid the calculation of gradients. The results show that the proposed training framework performed better than the gradient decent method.
arXiv Detail & Related papers (2020-10-06T11:12:23Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
Semi-Implicit Back Propagation [1.5533842336139065]
We propose a semi-implicit back propagation method for neural network training. The difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping. Experiments on both MNIST and CIFAR-10 demonstrate that the proposed algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy.
arXiv Detail & Related papers (2020-02-10T03:26:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.