Overall error analysis for the training of deep neural networks via
stochastic gradient descent with random initialisation
- URL: http://arxiv.org/abs/2003.01291v1
- Date: Tue, 3 Mar 2020 01:41:17 GMT
- Title: Overall error analysis for the training of deep neural networks via
stochastic gradient descent with random initialisation
- Authors: Arnulf Jentzen and Timo Welti
- Abstract summary: We provide a mathematically rigorous full error analysis of deep learning based empirical risk minimisation with quadratic loss function.
We establish, however, the first full error analysis in the scientific literature for a deep learning based algorithm in the probabilistically strong sense.
- Score: 2.4874453414078896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In spite of the accomplishments of deep learning based algorithms in numerous
applications and very broad corresponding research interest, at the moment
there is still no rigorous understanding of the reasons why such algorithms
produce useful results in certain situations. A thorough mathematical analysis
of deep learning based algorithms seems to be crucial in order to improve our
understanding and to make their implementation more effective and efficient. In
this article we provide a mathematically rigorous full error analysis of deep
learning based empirical risk minimisation with quadratic loss function in the
probabilistically strong sense, where the underlying deep neural networks are
trained using stochastic gradient descent with random initialisation. The
convergence speed we obtain is presumably far from optimal and suffers under
the curse of dimensionality. To the best of our knowledge, we establish,
however, the first full error analysis in the scientific literature for a deep
learning based algorithm in the probabilistically strong sense and, moreover,
the first full error analysis in the scientific literature for a deep learning
based algorithm where stochastic gradient descent with random initialisation is
the employed optimisation method.
Related papers
- Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Refining neural network predictions using background knowledge [68.35246878394702]
We show we can use logical background knowledge in learning system to compensate for a lack of labeled training data.
We introduce differentiable refinement functions that find a corrected prediction close to the original prediction.
This algorithm finds optimal refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not.
arXiv Detail & Related papers (2022-06-10T10:17:59Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - Performance Analysis of Fractional Learning Algorithms [32.21539962359158]
It is unclear whether the proclaimed superiority over conventional algorithms is well-grounded or is a myth as their performance has never been extensively analyzed.
In this article, a rigorous analysis of fractional variants of the least mean squares and steepest descent algorithms is performed.
Their origins and consequences on the performance of the learning algorithms are discussed and swift ready-witted remedies are proposed.
arXiv Detail & Related papers (2021-10-11T12:06:44Z) - Sparse Deep Learning: A New Framework Immune to Local Traps and
Miscalibration [12.05471394131891]
We provide a new framework for sparse deep learning, which has the above issues addressed in a coherent way.
We lay down a theoretical foundation for sparse deep learning and propose prior annealing algorithms for learning sparse neural networks.
arXiv Detail & Related papers (2021-10-01T21:16:34Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Proof of the Theory-to-Practice Gap in Deep Learning via Sampling
Complexity bounds for Neural Network Approximation Spaces [5.863264019032882]
We study the computational complexity of (deterministic or randomized) algorithms based on approximating or integrating functions.
One of the most important problems in this field concerns the question of whether it is possible to realize theoretically provable neural network approximation rates.
arXiv Detail & Related papers (2021-04-06T18:55:20Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Strong overall error analysis for the training of artificial neural
networks via random initializations [3.198144010381572]
We show that the depth of the neural network only needs to increase much slower in order to obtain the same rate of approximation.
Results hold in the case of an arbitrary optimization algorithm with i.i.d. random initializations.
arXiv Detail & Related papers (2020-12-15T17:34:16Z) - Enhancing accuracy of deep learning algorithms by training with
low-discrepancy sequences [15.2292571922932]
We propose a deep supervised learning algorithm based on low-discrepancy sequences as the training set.
We demonstrate that the proposed algorithm significantly outperforms standard deep learning algorithms for problems in moderately high dimensions.
arXiv Detail & Related papers (2020-05-26T08:14:00Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.