Related papers: Strong overall error analysis for the training of artificial neural networks via random initializations

Strong overall error analysis for the training of artificial neural networks via random initializations

URL: http://arxiv.org/abs/2012.08443v1
Date: Tue, 15 Dec 2020 17:34:16 GMT
Title: Strong overall error analysis for the training of artificial neural networks via random initializations
Authors: Arnulf Jentzen and Adrian Riekert
Abstract summary: We show that the depth of the neural network only needs to increase much slower in order to obtain the same rate of approximation. Results hold in the case of an arbitrary optimization algorithm with i.i.d. random initializations.
Score: 3.198144010381572
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although deep learning based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view. Recently, estimates for the convergence of the overall error have been obtained in the situation of deep supervised learning, but with an extremely slow rate of convergence. In this note we partially improve on these estimates. More specifically, we show that the depth of the neural network only needs to increase much slower in order to obtain the same rate of approximation. The results hold in the case of an arbitrary stochastic optimization algorithm with i.i.d.\ random initializations.

Related papers

Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks [19.185059111021854]
We study the implicit bias of the general family of steepest descent algorithms, which includes gradient descent, sign descent and coordinate descent. We prove that an algorithm-dependent geometric margin starts increasing once the networks reach perfect training accuracy.
arXiv Detail & Related papers (2024-10-29T14:28:49Z)
SGD method for entropy error function with smoothing l0 regularization for neural networks [3.108634881604788]
entropy error function has been widely used in neural networks. We propose a novel entropy function with smoothing l0 regularization for feed-forward neural networks. Our work is novel as it enables neural networks to learn effectively, producing more accurate predictions.
arXiv Detail & Related papers (2024-05-28T19:54:26Z)
Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning [2.695991050833627]
We propose a new optimization algorithm named CG-like-Adam for deep learning. Specifically, both the first-order and the second-order moment estimation of generic Adam are replaced by the conjugate-gradient-like. Numerical experiments show the superiority of the proposed algorithm based on the CIFAR10/100 dataset.
arXiv Detail & Related papers (2024-04-02T07:57:17Z)
A new approach to generalisation error of machine learning algorithms: Estimates and convergence [0.0]
We introduce a new approach to the estimation of the (generalisation) error and to convergence. Our results include estimates of the error without any structural assumption on the neural networks.
arXiv Detail & Related papers (2023-06-23T20:57:31Z)
On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point. Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z)
Scalable computation of prediction intervals for neural networks via matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure. This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization. We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches. It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts. Several works have pursued rigorous analytical justification for studying non optimization problems. A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)
Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation [2.4874453414078896]
We provide a mathematically rigorous full error analysis of deep learning based empirical risk minimisation with quadratic loss function. We establish, however, the first full error analysis in the scientific literature for a deep learning based algorithm in the probabilistically strong sense.
arXiv Detail & Related papers (2020-03-03T01:41:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.