Related papers: Parallel Trust-Region Approaches in Neural Network Training: Beyond Traditional Methods

Parallel Trust-Region Approaches in Neural Network Training: Beyond Traditional Methods

URL: http://arxiv.org/abs/2312.13677v1
Date: Thu, 21 Dec 2023 09:00:24 GMT
Title: Parallel Trust-Region Approaches in Neural Network Training: Beyond Traditional Methods
Authors: Ken Trotti, Samuel A. Cruz Alegr\'ia, Alena Kopani\v{c}\'akov\'a, Rolf Krause
Abstract summary: We propose to train neural networks (NNs) using a novel variant of the Additively Preconditioned Trust-region Strategy'' (APTS) The proposed method is based on a parallelizable additive domain decomposition approach applied to the neural network's parameters.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We propose to train neural networks (NNs) using a novel variant of the ``Additively Preconditioned Trust-region Strategy'' (APTS). The proposed method is based on a parallelizable additive domain decomposition approach applied to the neural network's parameters. Built upon the TR framework, the APTS method ensures global convergence towards a minimizer. Moreover, it eliminates the need for computationally expensive hyper-parameter tuning, as the TR algorithm automatically determines the step size in each iteration. We demonstrate the capabilities, strengths, and limitations of the proposed APTS training method by performing a series of numerical experiments. The presented numerical study includes a comparison with widely used training methods such as SGD, Adam, LBFGS, and the standard TR method.

Related papers

Data-Parallel Neural Network Training via Nonlinearly Preconditioned Trust-Region Method [0.0]
We propose a variant of the Additively Preconditioned Trust-Region Strategy (APTS) for training deep neural networks (DNNs) The proposed APTS method utilizes a data-parallel approach to construct a nonlinear preconditioner employed in the nonlinear optimization strategy. We demonstrate the performance of the proposed APTS variant using the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2025-02-07T18:11:33Z)
A Neural Network Training Method Based on Neuron Connection Coefficient Adjustments [2.684545081600664]
We present a new training approach for a neural network based on symmetric differential equations. Unlike the previously introduced method, this approach does not require adjustments to the fixed points of the differential equations. To validate this approach, we tested it on the MNIST dataset and achieved promising results.
arXiv Detail & Related papers (2025-01-25T16:07:30Z)
A Neural Network Training Method Based on Distributed PID Control [0.0]
In the previous article, we introduced a neural network framework based on symmetric differential equations. This study proposes an alternative training approach that utilizes differential equation signal propagation instead of chain rule derivation.
arXiv Detail & Related papers (2024-11-18T19:25:26Z)
Learning by the F-adjoint [0.0]
In this work, we develop and investigate this theoretical framework to improve some supervised learning algorithm for feed-forward neural network. Our main result is that by introducing some neural dynamical model combined by the gradient descent algorithm, we derived an equilibrium F-adjoint process. Experimental results on MNIST and Fashion-MNIST datasets, demonstrate that the proposed approach provide a significant improvements on the standard back-propagation training procedure.
arXiv Detail & Related papers (2024-07-08T13:49:25Z)
Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks. We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training. We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z)
A Simple and Efficient Stochastic Rounding Method for Training Neural Networks in Low Precision [0.0]
Conventional rounding (CSR) is widely employed in the training of neural networks (NNs) We introduce an improved rounding method, that is simple and efficient. The proposed method succeeds in training NNs with 16-bit fixed-point numbers.
arXiv Detail & Related papers (2021-03-24T18:47:03Z)
Local Critic Training for Model-Parallel Learning of Deep Neural Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z)
Tunable Subnetwork Splitting for Model-parallelism of Neural Network Training [12.755664985045582]
We propose a Tunable Subnetwork Splitting Method (TSSM) to tune the decomposition of deep neural networks. Our proposed TSSM can achieve significant speedup without observable loss of training accuracy.
arXiv Detail & Related papers (2020-09-09T01:05:12Z)
Tune smarter not harder: A principled approach to tuning learning rates for shallow nets [13.203765985718201]
principled approach to choosing the learning rate is proposed for shallow feedforward neural networks. It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods.
arXiv Detail & Related papers (2020-03-22T09:38:35Z)
Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z)
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy [119.12515258771302]
We show that a variant of PPOO equipped with over-parametrization converges to globally optimal networks. The key to our analysis is the iterate of infinite gradient under a notion of one-dimensional monotonicity, where the gradient and are instant by networks.
arXiv Detail & Related papers (2019-06-25T03:20:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.