A proof of convergence for stochastic gradient descent in the training
of artificial neural networks with ReLU activation for constant target
functions
- URL: http://arxiv.org/abs/2104.00277v1
- Date: Thu, 1 Apr 2021 06:28:30 GMT
- Title: A proof of convergence for stochastic gradient descent in the training
of artificial neural networks with ReLU activation for constant target
functions
- Authors: Arnulf Jentzen, Adrian Riekert
- Abstract summary: We study the gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation.
The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant.
- Score: 3.198144010381572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this article we study the stochastic gradient descent (SGD) optimization
method in the training of fully-connected feedforward artificial neural
networks with ReLU activation. The main result of this work proves that the
risk of the SGD process converges to zero if the target function under
consideration is constant. In the established convergence result the considered
artificial neural networks consist of one input layer, one hidden layer, and
one output layer (with $d \in \mathbb{N}$ neurons on the input layer, $H \in
\mathbb{N}$ neurons on the hidden layer, and one neuron on the output layer).
The learning rates of the SGD process are assumed to be sufficiently small and
the input data used in the SGD process to train the artificial neural networks
is assumed to be independent and identically distributed.
Related papers
- Fractional-order spike-timing-dependent gradient descent for multi-layer spiking neural networks [18.142378139047977]
This paper proposes a fractional-order spike-timing-dependent gradient descent (FOSTDGD) learning model.
It is tested on theNIST and DVS128 Gesture datasets and its accuracy under different network structure and fractional orders is analyzed.
arXiv Detail & Related papers (2024-10-20T05:31:34Z) - Efficient SGD Neural Network Training via Sublinear Activated Neuron
Identification [22.361338848134025]
We present a fully connected two-layer neural network for shifted ReLU activation to enable activated neuron identification in sublinear time via geometric search.
We also prove that our algorithm can converge in $O(M2/epsilon2)$ time with network size quadratic in the coefficient norm upper bound $M$ and error term $epsilon$.
arXiv Detail & Related papers (2023-07-13T05:33:44Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Learning with Local Gradients at the Edge [14.94491070863641]
We present a novel backpropagation-free optimization algorithm dubbed Target Projection Gradient Descent (tpSGD)
tpSGD generalizes direct random target projection to work with arbitrary loss functions.
We evaluate the performance of tpSGD in training deep neural networks and extend the approach to multi-layer RNNs.
arXiv Detail & Related papers (2022-08-17T19:51:06Z) - Convergence proof for stochastic gradient descent in the training of
deep neural networks with ReLU activation for constant target functions [1.7149364927872015]
gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs)
In this work we study SGD type optimization methods in the training of fully-connected feedforward DNNs with rectified linear unit (ReLU) activation.
arXiv Detail & Related papers (2021-12-13T11:45:36Z) - On the Convergence of Shallow Neural Network Training with Randomly
Masked Neurons [11.119895959906085]
Given a dense shallow neural network, we focus on creating, training, and combining randomly selected functions.
By analyzing $i)$ theworks' neural kernel, $ii)$ the surrogate functions' gradient, and $iii)$ how we sample and combine the surrogate functions, we prove linear convergence rate of the training error.
For fixed neuron selection probability, the error term decreases as we increase the number of surrogate models, and increases as we increase the number of local training steps.
arXiv Detail & Related papers (2021-12-05T19:51:14Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Exploiting Heterogeneity in Operational Neural Networks by Synaptic
Plasticity [87.32169414230822]
Recently proposed network model, Operational Neural Networks (ONNs), can generalize the conventional Convolutional Neural Networks (CNNs)
In this study the focus is drawn on searching the best-possible operator set(s) for the hidden neurons of the network based on the Synaptic Plasticity paradigm that poses the essential learning theory in biological neurons.
Experimental results over highly challenging problems demonstrate that the elite ONNs even with few neurons and layers can achieve a superior learning performance than GIS-based ONNs.
arXiv Detail & Related papers (2020-08-21T19:03:23Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.