Finite Sample Identification of Wide Shallow Neural Networks with Biases
- URL: http://arxiv.org/abs/2211.04589v1
- Date: Tue, 8 Nov 2022 22:10:32 GMT
- Title: Finite Sample Identification of Wide Shallow Neural Networks with Biases
- Authors: Massimo Fornasier, Timo Klock, Marco Mondelli, Michael Rauchensteiner
- Abstract summary: The identification of the parameters of the network from finite samples of input-output pairs is often referred to as the emphteacher-student model
This paper fills the gap by providing constructive methods and theoretical guarantees of finite sample identification for such wider shallow networks with biases.
- Score: 12.622813055808411
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial neural networks are functions depending on a finite number of
parameters typically encoded as weights and biases. The identification of the
parameters of the network from finite samples of input-output pairs is often
referred to as the \emph{teacher-student model}, and this model has represented
a popular framework for understanding training and generalization. Even if the
problem is NP-complete in the worst case, a rapidly growing literature -- after
adding suitable distributional assumptions -- has established finite sample
identification of two-layer networks with a number of neurons $m=\mathcal
O(D)$, $D$ being the input dimension. For the range $D<m<D^2$ the problem
becomes harder, and truly little is known for networks parametrized by biases
as well. This paper fills the gap by providing constructive methods and
theoretical guarantees of finite sample identification for such wider shallow
networks with biases. Our approach is based on a two-step pipeline: first, we
recover the direction of the weights, by exploiting second order information;
next, we identify the signs by suitable algebraic evaluations, and we recover
the biases by empirical risk minimization via gradient descent. Numerical
results demonstrate the effectiveness of our approach.
Related papers
- Computational Complexity of Learning Neural Networks: Smoothness and
Degeneracy [52.40331776572531]
We show that learning depth-$3$ ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework.
Our results are under a well-studied assumption on the existence of local pseudorandom generators.
arXiv Detail & Related papers (2023-02-15T02:00:26Z) - Unsupervised Learning of Initialization in Deep Neural Networks via
Maximum Mean Discrepancy [74.34895342081407]
We propose an unsupervised algorithm to find good initialization for input data.
We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification.
We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters.
arXiv Detail & Related papers (2023-02-08T23:23:28Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Is Stochastic Gradient Descent Near Optimal? [0.0]
We show that gradient descent achieves small expected error with a number of samples and total number of queries.
This suggests that SGD nearly achieves the information-theoretic sample complexity bounds of Joen & Van Roy (arXiv:2203.00246) in a computationally efficient manner.
arXiv Detail & Related papers (2022-09-18T18:26:43Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Physics-Enforced Modeling for Insertion Loss of Transmission Lines by
Deep Neural Networks [4.762000720968522]
We show that direct application of neural networks can lead to non-physics models with negative insertion loss.
One solution is to add a regulation term, which represents the passive condition, to the final loss function to enforce the negative quantity of insertion loss.
In the second method, a third-order expression is defined first, which ensures positiveness, to approximate the insertion loss.
arXiv Detail & Related papers (2021-07-27T00:22:10Z) - Stable Recovery of Entangled Weights: Towards Robust Identification of
Deep Neural Networks from Minimal Samples [0.0]
We introduce the so-called entangled weights, which compose weights of successive layers intertwined with suitable diagonal and invertible matrices depending on the activation functions and their shifts.
We prove that entangled weights are completely and stably approximated by an efficient and robust algorithm.
In terms of practical impact, our study shows that we can relate input-output information uniquely and stably to network parameters, providing a form of explainability.
arXiv Detail & Related papers (2021-01-18T16:31:19Z) - Random Vector Functional Link Networks for Function Approximation on Manifolds [8.535815777849786]
We show that single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice.
We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space.
arXiv Detail & Related papers (2020-07-30T23:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.