Neural Networks Generalize on Low Complexity Data
- URL: http://arxiv.org/abs/2409.12446v2
- Date: Tue, 29 Oct 2024 03:53:59 GMT
- Title: Neural Networks Generalize on Low Complexity Data
- Authors: Sourav Chatterjee, Timothy Sudijono,
- Abstract summary: We show that feedforward neural networks with ReLU activation generalize on low complexity data.
We define this simple programming language, along with a notion of description length of such networks.
We provide examples on basic computational tasks, such as checking primality of a natural number.
- Score: 5.678271181959529
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We show that feedforward neural networks with ReLU activation generalize on low complexity data, suitably defined. Given i.i.d. data generated from a simple programming language, the minimum description length (MDL) feedforward neural network which interpolates the data generalizes with high probability. We define this simple programming language, along with a notion of description length of such networks. We provide several examples on basic computational tasks, such as checking primality of a natural number, and more. For primality testing, our theorem shows the following. Suppose that we draw an i.i.d. sample of $\Theta(N^{\delta}\ln N)$ numbers uniformly at random from $1$ to $N$, where $\delta\in (0,1)$. For each number $x_i$, let $y_i = 1$ if $x_i$ is a prime and $0$ if it is not. Then with high probability, the MDL network fitted to this data accurately answers whether a newly drawn number between $1$ and $N$ is a prime or not, with test error $\leq O(N^{-\delta})$. Note that the network is not designed to detect primes; minimum description learning discovers a network which does so.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - On the Complexity of Neural Computation in Superposition [3.9803704378699103]
Recent advances in the understanding of neural networks suggest that superposition is a key mechanism underlying the computational efficiency of large-scale networks.
We show a nearly tight upper bound: logical operations like pairwise AND can be computed using $O(sqrtm' log m')$ neurons and $O(m' log2 m')$ parameters.
Our results open a path for using complexity theoretic techniques in neural network interpretability research.
arXiv Detail & Related papers (2024-09-05T18:58:59Z) - Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Most Neural Networks Are Almost Learnable [52.40331776572531]
We show that for any fixed $epsilon>0$ and depth $i$, there is a poly-time algorithm that learns random Xavier networks of depth $i$.
The algorithm runs in time and sample complexity of $(bard)mathrmpoly(epsilon-1)$, where $bar d$ is the size of the network.
For some cases of sigmoid and ReLU-like activations the bound can be improved to $(bard)mathrmpolylog(eps
arXiv Detail & Related papers (2023-05-25T22:27:42Z) - Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z) - On the Multidimensional Random Subset Sum Problem [0.9007371440329465]
In the Random Subset Sum Problem, given $n$ i.i.d. random variables $X_1,..., X_n$, we wish to approximate any point $z in [-1,1]$ as the sum of a subset $X_i_1(z),..., X_i_s(z)$ of them, up to error $varepsilon cdot.
We prove that, in $d$ dimensions, $n = O(d3log frac 1varepsilon cdot
arXiv Detail & Related papers (2022-07-28T08:10:43Z) - Learning (Very) Simple Generative Models Is Hard [45.13248517769758]
We show that no-time algorithm can solve problem even when output coordinates of $mathbbRdtobbRd'$ are one-hidden-layer ReLU networks with $mathrmpoly(d)$ neurons.
Key ingredient in our proof is an ODE-based construction of a compactly supported, piecewise-linear function $f$ with neurally-bounded slopes such that the pushforward of $mathcalN(0,1)$ under $f$ matches all low-degree moments of $mathcal
arXiv Detail & Related papers (2022-05-31T17:59:09Z) - On the Provable Generalization of Recurrent Neural Networks [7.115768009778412]
We analyze the training and generalization for Recurrent Neural Network (RNN)
We prove a generalization error bound to learn functions without normalized conditions.
We also prove a novel result to learn N-variables functions of input sequence.
arXiv Detail & Related papers (2021-09-29T02:06:33Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.