Linear approximability of two-layer neural networks: A comprehensive
analysis based on spectral decay
- URL: http://arxiv.org/abs/2108.04964v1
- Date: Tue, 10 Aug 2021 23:30:29 GMT
- Title: Linear approximability of two-layer neural networks: A comprehensive
analysis based on spectral decay
- Authors: Jihao Long, Lei Wu
- Abstract summary: We first consider the case of single neuron and show that the linear approximability, quantified by the Kolmogorov width, is controlled by the eigenvalue decay of an associate kernel.
We show that similar results also hold for two-layer neural networks.
- Score: 4.042159113348107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a spectral-based approach to study the linear
approximation of two-layer neural networks. We first consider the case of
single neuron and show that the linear approximability, quantified by the
Kolmogorov width, is controlled by the eigenvalue decay of an associate kernel.
Then, we show that similar results also hold for two-layer neural networks.
This spectral-based approach allows us to obtain upper bounds, lower bounds,
and explicit hard examples in a united manner. In particular, these bounds
imply that for networks activated by smooth functions, restricting the norms of
inner-layer weights may significantly impair the expressiveness. By contrast,
for non-smooth activation functions, such as ReLU, the network expressiveness
is independent of the inner-layer weight norms. In addition, we prove that for
a family of non-smooth activation functions, including ReLU, approximating any
single neuron with random features suffers from the \emph{curse of
dimensionality}. This provides an explicit separation of expressiveness between
neural networks and random feature models.
Related papers
- Shallow ReLU neural networks and finite elements [1.3597551064547502]
We show that piecewise linear functions on a convex polytope mesh can be represented by two-hidden-layer ReLU neural networks in a weak sense.
The numbers of neurons of the two hidden layers required to weakly represent are accurately given based on the numbers of polytopes and hyperplanes involved in this mesh.
arXiv Detail & Related papers (2024-03-09T06:12:06Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with
Linear Convergence Rates [7.094295642076582]
Mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime.
We establish a new linear convergence result for two-layer neural networks trained by continuous-time noisy descent in the mean-field regime.
arXiv Detail & Related papers (2022-05-19T21:05:40Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Phenomenology of Double Descent in Finite-Width Neural Networks [29.119232922018732]
Double descent delineates the behaviour of models depending on the regime they belong to.
We use influence functions to derive suitable expressions of the population loss and its lower bound.
Building on our analysis, we investigate how the loss function affects double descent.
arXiv Detail & Related papers (2022-03-14T17:39:49Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Going Beyond Linear RL: Sample Efficient Neural Function Approximation [76.57464214864756]
We study function approximation with two-layer neural networks.
Our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
arXiv Detail & Related papers (2021-07-14T03:03:56Z) - On the approximation of functions by tanh neural networks [0.0]
We derive bounds on the error, in high-order Sobolev norms, incurred in the approximation of Sobolev-regular.
We show that tanh neural networks with only two hidden layers suffice to approximate functions at comparable or better rates than much deeper ReLU neural networks.
arXiv Detail & Related papers (2021-04-18T19:30:45Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - A function space analysis of finite neural networks with insights from
sampling theory [41.07083436560303]
We show that the function space generated by multi-layer networks with non-expansive activation functions is smooth.
Under the assumption that the input is band-limited, we provide novel error bounds.
We analyze both deterministic uniform and random sampling showing the advantage of the former.
arXiv Detail & Related papers (2020-04-15T10:25:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.