Parallel Trust-Region Approaches in Neural Network Training: Beyond
Traditional Methods
- URL: http://arxiv.org/abs/2312.13677v1
- Date: Thu, 21 Dec 2023 09:00:24 GMT
- Title: Parallel Trust-Region Approaches in Neural Network Training: Beyond
Traditional Methods
- Authors: Ken Trotti, Samuel A. Cruz Alegr\'ia, Alena Kopani\v{c}\'akov\'a, Rolf
Krause
- Abstract summary: We propose to train neural networks (NNs) using a novel variant of the Additively Preconditioned Trust-region Strategy'' (APTS)
The proposed method is based on a parallelizable additive domain decomposition approach applied to the neural network's parameters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose to train neural networks (NNs) using a novel variant of the
``Additively Preconditioned Trust-region Strategy'' (APTS). The proposed method
is based on a parallelizable additive domain decomposition approach applied to
the neural network's parameters. Built upon the TR framework, the APTS method
ensures global convergence towards a minimizer. Moreover, it eliminates the
need for computationally expensive hyper-parameter tuning, as the TR algorithm
automatically determines the step size in each iteration. We demonstrate the
capabilities, strengths, and limitations of the proposed APTS training method
by performing a series of numerical experiments. The presented numerical study
includes a comparison with widely used training methods such as SGD, Adam,
LBFGS, and the standard TR method.
Related papers
- A Neural Network Training Method Based on Distributed PID Control [0.0]
In the previous article, we introduced a neural network framework based on symmetric differential equations.
This study proposes an alternative training approach that utilizes differential equation signal propagation instead of chain rule derivation.
arXiv Detail & Related papers (2024-11-18T19:25:26Z) - Learning by the F-adjoint [0.0]
In this work, we develop and investigate this theoretical framework to improve some supervised learning algorithm for feed-forward neural network.
Our main result is that by introducing some neural dynamical model combined by the gradient descent algorithm, we derived an equilibrium F-adjoint process.
Experimental results on MNIST and Fashion-MNIST datasets, demonstrate that the proposed approach provide a significant improvements on the standard back-propagation training procedure.
arXiv Detail & Related papers (2024-07-08T13:49:25Z) - Domain Generalization Guided by Gradient Signal to Noise Ratio of
Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks.
We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z) - A Simple and Efficient Stochastic Rounding Method for Training Neural
Networks in Low Precision [0.0]
Conventional rounding (CSR) is widely employed in the training of neural networks (NNs)
We introduce an improved rounding method, that is simple and efficient.
The proposed method succeeds in training NNs with 16-bit fixed-point numbers.
arXiv Detail & Related papers (2021-03-24T18:47:03Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Tunable Subnetwork Splitting for Model-parallelism of Neural Network
Training [12.755664985045582]
We propose a Tunable Subnetwork Splitting Method (TSSM) to tune the decomposition of deep neural networks.
Our proposed TSSM can achieve significant speedup without observable loss of training accuracy.
arXiv Detail & Related papers (2020-09-09T01:05:12Z) - Tune smarter not harder: A principled approach to tuning learning rates
for shallow nets [13.203765985718201]
principled approach to choosing the learning rate is proposed for shallow feedforward neural networks.
It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods.
arXiv Detail & Related papers (2020-03-22T09:38:35Z) - Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models.
We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z) - Neural Proximal/Trust Region Policy Optimization Attains Globally
Optimal Policy [119.12515258771302]
We show that a variant of PPOO equipped with over-parametrization converges to globally optimal networks.
The key to our analysis is the iterate of infinite gradient under a notion of one-dimensional monotonicity, where the gradient and are instant by networks.
arXiv Detail & Related papers (2019-06-25T03:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.