Existence, uniqueness, and convergence rates for gradient flows in the
training of artificial neural networks with ReLU activation
- URL: http://arxiv.org/abs/2108.08106v1
- Date: Wed, 18 Aug 2021 12:06:19 GMT
- Title: Existence, uniqueness, and convergence rates for gradient flows in the
training of artificial neural networks with ReLU activation
- Authors: Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg S. Weiss
- Abstract summary: The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation via gradient descent (GD) type optimization schemes is nowadays a common industrially relevant procedure.
Till this day in the scientific literature there is in general no mathematical convergence analysis which explains the numerical success of GD type schemes in the training of ANNs with ReLU activation.
- Score: 2.4087148947930634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The training of artificial neural networks (ANNs) with rectified linear unit
(ReLU) activation via gradient descent (GD) type optimization schemes is
nowadays a common industrially relevant procedure. Till this day in the
scientific literature there is in general no mathematical convergence analysis
which explains the numerical success of GD type optimization schemes in the
training of ANNs with ReLU activation. GD type optimization schemes can be
regarded as temporal discretization methods for the gradient flow (GF)
differential equations associated to the considered optimization problem and,
in view of this, it seems to be a natural direction of research to first aim to
develop a mathematical convergence theory for time-continuous GF differential
equations and, thereafter, to aim to extend such a time-continuous convergence
theory to implementable time-discrete GD type optimization methods. In this
article we establish two basic results for GF differential equations in the
training of fully-connected feedforward ANNs with one hidden layer and ReLU
activation. In the first main result of this article we establish in the
training of such ANNs under the assumption that the probability distribution of
the input data of the considered supervised learning problem is absolutely
continuous with a bounded density function that every GF differential equation
admits for every initial value a solution which is also unique among a suitable
class of solutions. In the second main result of this article we prove in the
training of such ANNs under the assumption that the target function and the
density function of the probability distribution of the input data are
piecewise polynomial that every non-divergent GF trajectory converges with an
appropriate rate of convergence to a critical point and that the risk of the
non-divergent GF trajectory converges with rate 1 to the risk of the critical
point.
Related papers
- A model-constrained Discontinuous Galerkin Network (DGNet) for Compressible Euler Equations with Out-of-Distribution Generalization [0.0]
We develop a model-constrained discontinuous Galerkin Network (DGNet) approach to solve compressible Euler equations.
To validate the effectiveness, stability, and generalizability of our novel DGNet approach, we present numerical results for 1D and 2D compressible Euler equation problems.
arXiv Detail & Related papers (2024-09-27T01:13:38Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - On the existence of global minima and convergence analyses for gradient
descent methods in the training of deep neural networks [3.198144010381572]
We study feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers.
We prove convergence of the risk of the GD optimization method with randoms in the training of such ANNs.
We also study solutions of gradient flow differential equations.
arXiv Detail & Related papers (2021-12-17T18:55:40Z) - Convergence proof for stochastic gradient descent in the training of
deep neural networks with ReLU activation for constant target functions [1.7149364927872015]
gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs)
In this work we study SGD type optimization methods in the training of fully-connected feedforward DNNs with rectified linear unit (ReLU) activation.
arXiv Detail & Related papers (2021-12-13T11:45:36Z) - A proof of convergence for the gradient descent optimization method with
random initializations in the training of neural networks with ReLU
activation for piecewise linear target functions [3.198144010381572]
Gradient descent (GD) type optimization methods are the standard instrument to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation.
arXiv Detail & Related papers (2021-08-10T12:01:37Z) - Convergence analysis for gradient flows in the training of artificial
neural networks with ReLU activation [3.198144010381572]
Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation.
Most of the key difficulties in the mathematical convergence analysis of GD type optimization schemes in the training of ANNs with ReLU activation seem to be already present in the dynamics of the corresponding GF differential equations.
arXiv Detail & Related papers (2021-07-09T15:08:30Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.