Related papers: Can stable and accurate neural networks be computed? -- On the barriers of deep learning and Smale's 18th problem

Can stable and accurate neural networks be computed? -- On the barriers of deep learning and Smale's 18th problem

URL: http://arxiv.org/abs/2101.08286v1
Date: Wed, 20 Jan 2021 19:04:17 GMT
Title: Can stable and accurate neural networks be computed? -- On the barriers of deep learning and Smale's 18th problem
Authors: Vegard Antun, Matthew J. Colbrook, Anders C. Hansen
Abstract summary: Deep learning (DL) has had unprecedented success and is now entering scientific computing with full force. DL suffers from a universal phenomenon: instability, despite universal approximating properties that often guarantee the existence of stable neural networks (NNs) We show that there does not exist any algorithm, even randomised, that can train (or compute) such a NN. We introduce Fast Iterative REstarted NETworks (FIRENETs), which we prove and numerically verify are stable.
Score: 0.5801044612920815
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deep learning (DL) has had unprecedented success and is now entering scientific computing with full force. However, DL suffers from a universal phenomenon: instability, despite universal approximating properties that often guarantee the existence of stable neural networks (NNs). We show the following paradox. There are basic well-conditioned problems in scientific computing where one can prove the existence of NNs with great approximation qualities, however, there does not exist any algorithm, even randomised, that can train (or compute) such a NN. Indeed, for any positive integers $K > 2$ and $L$, there are cases where simultaneously: (a) no randomised algorithm can compute a NN correct to $K$ digits with probability greater than $1/2$, (b) there exists a deterministic algorithm that computes a NN with $K-1$ correct digits, but any such (even randomised) algorithm needs arbitrarily many training data, (c) there exists a deterministic algorithm that computes a NN with $K-2$ correct digits using no more than $L$ training samples. These results provide basic foundations for Smale's 18th problem and imply a potentially vast, and crucial, classification theory describing conditions under which (stable) NNs with a given accuracy can be computed by an algorithm. We begin this theory by initiating a unified theory for compressed sensing and DL, leading to sufficient conditions for the existence of algorithms that compute stable NNs in inverse problems. We introduce Fast Iterative REstarted NETworks (FIRENETs), which we prove and numerically verify are stable. Moreover, we prove that only $\mathcal{O}(|\log(\epsilon)|)$ layers are needed for an $\epsilon$ accurate solution to the inverse problem (exponential convergence), and that the inner dimensions in the layers do not exceed the dimension of the inverse problem. Thus, FIRENETs are computationally very efficient.

Related papers

Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two principled algorithms for the test-time compute of large language models. We prove theoretically that the failure probability of one algorithm decays to zero exponentially as its test-time compute grows.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube. We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z)
Tight Verification of Probabilistic Robustness in Bayesian Neural Networks [17.499817915644467]
We introduce two algorithms for computing tight guarantees on the probabilistic robustness of Bayesian Neural Networks (BNNs) Our algorithms efficiently search the parameters' space for safe weights by using iterative expansion and the network's gradient. In addition to proving that our algorithms compute tighter bounds than the SoA, we also evaluate our algorithms against the SoA on standard benchmarks.
arXiv Detail & Related papers (2024-01-21T23:41:32Z)
Mind the $\tilde{\mathcal{O}}$: Asymptotically Better, but Still Impractical, Quantum Distributed Algorithms [0.0]
We present two algorithms in the Quantum CONGEST-CLIQUE model of distributed computation that succeed with high probability. The algorithms achieve a lower round and message complexity than any known algorithms in the classical CONGEST-CLIQUE model. An existing framework for using distributed version of Grover's search algorithm to accelerate triangle finding lies at the core of the speedup.
arXiv Detail & Related papers (2023-04-06T02:18:52Z)
Deterministic Nonsmooth Nonconvex Optimization [94.01526844386977]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm. Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z)
Private estimation algorithms for stochastic block models and mixture models [63.07482515700984]
General tools for designing efficient private estimation algorithms. First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z)
Censored Quantile Regression Neural Networks [24.118509578363593]
This paper considers doing quantile regression on censored data using neural networks (NNs) We show how an algorithm popular in linear models can be applied to NNs. Our major contribution is a novel algorithm that simultaneously optimises a grid of quantiles output by a single NN.
arXiv Detail & Related papers (2022-05-26T17:10:28Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Geometric Path Enumeration for Equivalence Verification of Neural Networks [2.007262412327553]
We focus on the formal verification problem of NN equivalence which aims to prove that two NNs show equivalent behavior. We show a theoretical result by proving that the epsilon-equivalence problem is coNP-complete. In a third step, we implement the extended algorithm for equivalence verification and evaluate optimizations necessary for its practical use.
arXiv Detail & Related papers (2021-12-13T11:56:08Z)
The mathematics of adversarial attacks in AI -- Why deep learning is unstable despite the existence of stable neural networks [69.33657875725747]
We prove that any training procedure based on training neural networks for classification problems with a fixed architecture will yield neural networks that are either inaccurate or unstable (if accurate) The key is that the stable and accurate neural networks must have variable dimensions depending on the input, in particular, variable dimensions is a necessary condition for stability. Our result points towards the paradox that accurate and stable neural networks exist, however, modern algorithms do not compute them.
arXiv Detail & Related papers (2021-09-13T16:19:25Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.