Related papers: Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

URL: http://arxiv.org/abs/2003.01291v1
Date: Tue, 3 Mar 2020 01:41:17 GMT
Title: Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
Authors: Arnulf Jentzen and Timo Welti
Abstract summary: We provide a mathematically rigorous full error analysis of deep learning based empirical risk minimisation with quadratic loss function. We establish, however, the first full error analysis in the scientific literature for a deep learning based algorithm in the probabilistically strong sense.
Score: 2.4874453414078896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In spite of the accomplishments of deep learning based algorithms in numerous applications and very broad corresponding research interest, at the moment there is still no rigorous understanding of the reasons why such algorithms produce useful results in certain situations. A thorough mathematical analysis of deep learning based algorithms seems to be crucial in order to improve our understanding and to make their implementation more effective and efficient. In this article we provide a mathematically rigorous full error analysis of deep learning based empirical risk minimisation with quadratic loss function in the probabilistically strong sense, where the underlying deep neural networks are trained using stochastic gradient descent with random initialisation. The convergence speed we obtain is presumably far from optimal and suffers under the curse of dimensionality. To the best of our knowledge, we establish, however, the first full error analysis in the scientific literature for a deep learning based algorithm in the probabilistically strong sense and, moreover, the first full error analysis in the scientific literature for a deep learning based algorithm where stochastic gradient descent with random initialisation is the employed optimisation method.

Related papers

Statistically guided deep learning [10.619901778151336]
We present a theoretically well-founded deep learning algorithm for nonparametric regression. We show that a theoretical analysis of deep learning which takes into account simultaneously optimization, generalization and approximation can result in a new deep learning estimate.
arXiv Detail & Related papers (2025-04-11T12:36:06Z)
Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality [52.906438147288256]
We show that our algorithm can identify the globally optimal reward and policy under certain neural network structures. This is the first IRL algorithm with a non-asymptotic convergence guarantee that provably achieves global optimality.
arXiv Detail & Related papers (2025-03-22T21:16:08Z)
Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity. We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z)
Refining neural network predictions using background knowledge [68.35246878394702]
We show we can use logical background knowledge in learning system to compensate for a lack of labeled training data. We introduce differentiable refinement functions that find a corrected prediction close to the original prediction. This algorithm finds optimal refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not.
arXiv Detail & Related papers (2022-06-10T10:17:59Z)
Scalable computation of prediction intervals for neural networks via matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure. This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z)
Performance Analysis of Fractional Learning Algorithms [32.21539962359158]
It is unclear whether the proclaimed superiority over conventional algorithms is well-grounded or is a myth as their performance has never been extensively analyzed. In this article, a rigorous analysis of fractional variants of the least mean squares and steepest descent algorithms is performed. Their origins and consequences on the performance of the learning algorithms are discussed and swift ready-witted remedies are proposed.
arXiv Detail & Related papers (2021-10-11T12:06:44Z)
Sparse Deep Learning: A New Framework Immune to Local Traps and Miscalibration [12.05471394131891]
We provide a new framework for sparse deep learning, which has the above issues addressed in a coherent way. We lay down a theoretical foundation for sparse deep learning and propose prior annealing algorithms for learning sparse neural networks.
arXiv Detail & Related papers (2021-10-01T21:16:34Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Proof of the Theory-to-Practice Gap in Deep Learning via Sampling Complexity bounds for Neural Network Approximation Spaces [5.863264019032882]
We study the computational complexity of (deterministic or randomized) algorithms based on approximating or integrating functions. One of the most important problems in this field concerns the question of whether it is possible to realize theoretically provable neural network approximation rates.
arXiv Detail & Related papers (2021-04-06T18:55:20Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Strong overall error analysis for the training of artificial neural networks via random initializations [3.198144010381572]
We show that the depth of the neural network only needs to increase much slower in order to obtain the same rate of approximation. Results hold in the case of an arbitrary optimization algorithm with i.i.d. random initializations.
arXiv Detail & Related papers (2020-12-15T17:34:16Z)
Enhancing accuracy of deep learning algorithms by training with low-discrepancy sequences [15.2292571922932]
We propose a deep supervised learning algorithm based on low-discrepancy sequences as the training set. We demonstrate that the proposed algorithm significantly outperforms standard deep learning algorithms for problems in moderately high dimensions.
arXiv Detail & Related papers (2020-05-26T08:14:00Z)
Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts. Several works have pursued rigorous analytical justification for studying non optimization problems. A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.