Most Neural Networks Are Almost Learnable
- URL: http://arxiv.org/abs/2305.16508v3
- Date: Tue, 24 Oct 2023 19:36:51 GMT
- Title: Most Neural Networks Are Almost Learnable
- Authors: Amit Daniely, Nathan Srebro, Gal Vardi
- Abstract summary: We show that for any fixed $epsilon>0$ and depth $i$, there is a poly-time algorithm that learns random Xavier networks of depth $i$.
The algorithm runs in time and sample complexity of $(bard)mathrmpoly(epsilon-1)$, where $bar d$ is the size of the network.
For some cases of sigmoid and ReLU-like activations the bound can be improved to $(bard)mathrmpolylog(eps
- Score: 52.40331776572531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a PTAS for learning random constant-depth networks. We show that
for any fixed $\epsilon>0$ and depth $i$, there is a poly-time algorithm that
for any distribution on $\sqrt{d} \cdot \mathbb{S}^{d-1}$ learns random Xavier
networks of depth $i$, up to an additive error of $\epsilon$. The algorithm
runs in time and sample complexity of
$(\bar{d})^{\mathrm{poly}(\epsilon^{-1})}$, where $\bar d$ is the size of the
network. For some cases of sigmoid and ReLU-like activations the bound can be
improved to $(\bar{d})^{\mathrm{polylog}(\epsilon^{-1})}$, resulting in a
quasi-poly-time algorithm for learning constant depth random networks.
Related papers
- Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians [7.04316974339151]
We study the estimation of distributional parameters when samples are shown only if they fall in some unknown set.
We develop tools that may be of independent interest, including a reduction from PAC learning with positive and unlabeled samples to PAC learning with positive and negative samples.
arXiv Detail & Related papers (2024-10-02T15:21:07Z) - Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - A faster and simpler algorithm for learning shallow networks [10.595936992218856]
We revisit the well-studied problem of learning a linear combination of $k$ ReLU activations given labeled examples.
Here we show that a simpler one-stage version of their algorithm suffices, and moreover its runtime is only $(d/varepsilon)O(k2)$.
arXiv Detail & Related papers (2023-07-24T03:04:10Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.