How Powerful are Shallow Neural Networks with Bandlimited Random
Weights?
- URL: http://arxiv.org/abs/2008.08427v4
- Date: Wed, 31 May 2023 10:04:21 GMT
- Title: How Powerful are Shallow Neural Networks with Bandlimited Random
Weights?
- Authors: Ming Li, Sho Sonoda, Feilong Cao, Yu Guang Wang, Jiye Liang
- Abstract summary: We investigate the expressive power of limited depth-2 band random neural networks.
A random net is a neural network where the hidden layer parameters are frozen with random bandwidth.
- Score: 25.102870584507244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the expressive power of depth-2 bandlimited random neural
networks. A random net is a neural network where the hidden layer parameters
are frozen with random assignment, and only the output layer parameters are
trained by loss minimization. Using random weights for a hidden layer is an
effective method to avoid non-convex optimization in standard gradient descent
learning. It has also been adopted in recent deep learning theories. Despite
the well-known fact that a neural network is a universal approximator, in this
study, we mathematically show that when hidden parameters are distributed in a
bounded domain, the network may not achieve zero approximation error. In
particular, we derive a new nontrivial approximation error lower bound. The
proof utilizes the technique of ridgelet analysis, a harmonic analysis method
designed for neural networks. This method is inspired by fundamental principles
in classical signal processing, specifically the idea that signals with limited
bandwidth may not always be able to perfectly recreate the original signal. We
corroborate our theoretical results with various simulation studies, and
generally, two main take-home messages are offered: (i) Not any distribution
for selecting random weights is feasible to build a universal approximator;
(ii) A suitable assignment of random weights exists but to some degree is
associated with the complexity of the target function.
Related papers
- Approximation with Random Shallow ReLU Networks with Applications to Model Reference Adaptive Control [0.0]
We show that ReLU networks with randomly generated weights and biases achieve $L_infty$ error of $O(m-1/2)$ with high probability.
We show how the result can be used to get approximations of required accuracy in a model reference adaptive control application.
arXiv Detail & Related papers (2024-03-25T19:39:17Z) - Sampling weights of deep neural networks [1.2370077627846041]
We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks.
In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed.
We prove that sampled networks are universal approximators.
arXiv Detail & Related papers (2023-06-29T10:13:36Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Finding Everything within Random Binary Networks [11.689913953698081]
We prove that a random network can be approximated up to arbitrary accuracy by simply pruning a random network of binary $pm1$ weights.
We prove that any target network can be approximated up to arbitrary accuracy by simply pruning a random network of binary $pm1$ weights that is only a polylogarithmic factor wider and deeper than the target network.
arXiv Detail & Related papers (2021-10-18T03:19:25Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Searching for Minimal Optimal Neural Networks [4.94950858749529]
Large neural network models have high predictive power but may suffer from overfitting if the training set is not large enough.
The destructive approach, which starts with a large architecture and then reduces the size using a Lasso-type penalty, has been used extensively for this task.
We prove that Adaptive group Lasso is consistent and can reconstruct the correct number of hidden nodes of one-hidden-layer feedforward networks with high probability.
arXiv Detail & Related papers (2021-09-27T14:08:07Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Random Vector Functional Link Networks for Function Approximation on Manifolds [8.535815777849786]
We show that single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice.
We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space.
arXiv Detail & Related papers (2020-07-30T23:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.