Related papers: How Powerful are Shallow Neural Networks with Bandlimited Random Weights?

How Powerful are Shallow Neural Networks with Bandlimited Random Weights?

URL: http://arxiv.org/abs/2008.08427v4
Date: Wed, 31 May 2023 10:04:21 GMT
Title: How Powerful are Shallow Neural Networks with Bandlimited Random Weights?
Authors: Ming Li, Sho Sonoda, Feilong Cao, Yu Guang Wang, Jiye Liang
Abstract summary: We investigate the expressive power of limited depth-2 band random neural networks. A random net is a neural network where the hidden layer parameters are frozen with random bandwidth.
Score: 25.102870584507244
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the expressive power of depth-2 bandlimited random neural networks. A random net is a neural network where the hidden layer parameters are frozen with random assignment, and only the output layer parameters are trained by loss minimization. Using random weights for a hidden layer is an effective method to avoid non-convex optimization in standard gradient descent learning. It has also been adopted in recent deep learning theories. Despite the well-known fact that a neural network is a universal approximator, in this study, we mathematically show that when hidden parameters are distributed in a bounded domain, the network may not achieve zero approximation error. In particular, we derive a new nontrivial approximation error lower bound. The proof utilizes the technique of ridgelet analysis, a harmonic analysis method designed for neural networks. This method is inspired by fundamental principles in classical signal processing, specifically the idea that signals with limited bandwidth may not always be able to perfectly recreate the original signal. We corroborate our theoretical results with various simulation studies, and generally, two main take-home messages are offered: (i) Not any distribution for selecting random weights is feasible to build a universal approximator; (ii) A suitable assignment of random weights exists but to some degree is associated with the complexity of the target function.

Related papers

Approximation with Random Shallow ReLU Networks with Applications to Model Reference Adaptive Control [0.0]
We show that ReLU networks with randomly generated weights and biases achieve $L_infty$ error of $O(m-1/2)$ with high probability. We show how the result can be used to get approximations of required accuracy in a model reference adaptive control application.
arXiv Detail & Related papers (2024-03-25T19:39:17Z)
Sampling weights of deep neural networks [1.2370077627846041]
We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks. In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed. We prove that sampled networks are universal approximators.
arXiv Detail & Related papers (2023-06-29T10:13:36Z)
Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK) In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z)
Finding Everything within Random Binary Networks [11.689913953698081]
We prove that a random network can be approximated up to arbitrary accuracy by simply pruning a random network of binary $pm1$ weights. We prove that any target network can be approximated up to arbitrary accuracy by simply pruning a random network of binary $pm1$ weights that is only a polylogarithmic factor wider and deeper than the target network.
arXiv Detail & Related papers (2021-10-18T03:19:25Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Searching for Minimal Optimal Neural Networks [4.94950858749529]
Large neural network models have high predictive power but may suffer from overfitting if the training set is not large enough. The destructive approach, which starts with a large architecture and then reduces the size using a Lasso-type penalty, has been used extensively for this task. We prove that Adaptive group Lasso is consistent and can reconstruct the correct number of hidden nodes of one-hidden-layer feedforward networks with high probability.
arXiv Detail & Related papers (2021-09-27T14:08:07Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Random Vector Functional Link Networks for Function Approximation on Manifolds [8.535815777849786]
We show that single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice. We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space.
arXiv Detail & Related papers (2020-07-30T23:50:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.