Related papers: Proof of the Theory-to-Practice Gap in Deep Learning via Sampling Complexity bounds for Neural Network Approximation Spaces

Proof of the Theory-to-Practice Gap in Deep Learning via Sampling Complexity bounds for Neural Network Approximation Spaces

URL: http://arxiv.org/abs/2104.02746v1
Date: Tue, 6 Apr 2021 18:55:20 GMT
Title: Proof of the Theory-to-Practice Gap in Deep Learning via Sampling Complexity bounds for Neural Network Approximation Spaces
Authors: Philipp Grohs, Felix Voigtlaender
Abstract summary: We study the computational complexity of (deterministic or randomized) algorithms based on approximating or integrating functions. One of the most important problems in this field concerns the question of whether it is possible to realize theoretically provable neural network approximation rates.
Score: 5.863264019032882
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the computational complexity of (deterministic or randomized) algorithms based on point samples for approximating or integrating functions that can be well approximated by neural networks. Such algorithms (most prominently stochastic gradient descent and its variants) are used extensively in the field of deep learning. One of the most important problems in this field concerns the question of whether it is possible to realize theoretically provable neural network approximation rates by such algorithms. We answer this question in the negative by proving hardness results for the problems of approximation and integration on a novel class of neural network approximation spaces. In particular, our results confirm a conjectured and empirically observed theory-to-practice gap in deep learning. We complement our hardness results by showing that approximation rates of a comparable order of convergence are (at least theoretically) achievable.

Related papers

Theory-to-Practice Gap for Neural Networks and Neural Operators [6.267574471145217]
We study the sampling complexity of learning with ReLU neural networks and neural operators. We show that the best-possible convergence rate in a Bochner $Lp$-norm is bounded by Monte-Carlo rates of order $1/p$.
arXiv Detail & Related papers (2025-03-23T21:45:58Z)
Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection [11.729744197698718]
We present an algorithmic framework to approximate a neural network of finite width and depth. We iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Our results can represent an important step towards understanding neural network predictions.
arXiv Detail & Related papers (2024-07-26T12:45:53Z)
A new approach to generalisation error of machine learning algorithms: Estimates and convergence [0.0]
We introduce a new approach to the estimation of the (generalisation) error and to convergence. Our results include estimates of the error without any structural assumption on the neural networks.
arXiv Detail & Related papers (2023-06-23T20:57:31Z)
Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective. We show how to compute this efficiently for tractable circuits. We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z)
Mean-field neural networks: learning mappings on Wasserstein space [0.0]
We study the machine learning task for models with operators mapping between the Wasserstein space of probability measures and a space of functions. Two classes of neural networks are proposed to learn so-called mean-field functions. We present different algorithms relying on mean-field neural networks for solving time-dependent mean-field problems.
arXiv Detail & Related papers (2022-10-27T05:11:42Z)
Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains. We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z)
An Overview of Uncertainty Quantification Methods for Infinite Neural Networks [0.0]
We review methods for quantifying uncertainty in infinite-width neural networks. We make use of several equivalence results along the way to obtain exact closed-form solutions for predictive uncertainty.
arXiv Detail & Related papers (2022-01-13T00:03:22Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
Disentangling the Gauss-Newton Method and Approximate Inference for Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning. We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly. The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z)
Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation [2.4874453414078896]
We provide a mathematically rigorous full error analysis of deep learning based empirical risk minimisation with quadratic loss function. We establish, however, the first full error analysis in the scientific literature for a deep learning based algorithm in the probabilistically strong sense.
arXiv Detail & Related papers (2020-03-03T01:41:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.