Bayes-optimal Learning of Deep Random Networks of Extensive-width
- URL: http://arxiv.org/abs/2302.00375v2
- Date: Wed, 21 Jun 2023 12:24:57 GMT
- Title: Bayes-optimal Learning of Deep Random Networks of Extensive-width
- Authors: Hugo Cui, Florent Krzakala, Lenka Zdeborov\'a
- Abstract summary: We consider the problem of learning a target function corresponding to a deep, extensive, non-linear neural network with random Gaussian weights.
We compute closed-form expressions for the test errors of ridge regression, kernel and random features regression.
We show numerically that when the number of samples grows faster than the dimension, ridge and kernel methods become suboptimal, while neural networks achieve test error close to zero from quadratically many samples.
- Score: 22.640648403570957
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of learning a target function corresponding to a
deep, extensive-width, non-linear neural network with random Gaussian weights.
We consider the asymptotic limit where the number of samples, the input
dimension and the network width are proportionally large. We propose a
closed-form expression for the Bayes-optimal test error, for regression and
classification tasks. We further compute closed-form expressions for the test
errors of ridge regression, kernel and random features regression. We find, in
particular, that optimally regularized ridge regression, as well as kernel
regression, achieve Bayes-optimal performances, while the logistic loss yields
a near-optimal test error for classification. We further show numerically that
when the number of samples grows faster than the dimension, ridge and kernel
methods become suboptimal, while neural networks achieve test error close to
zero from quadratically many samples.
Related papers
- Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods [43.32546195968771]
We study the data-dependent convergence and generalization behavior of gradient methods for neural networks with smooth activation.
Our results improve upon the shortcomings of the well-established Rademacher complexity-based bounds.
We show that a large step-size significantly improves upon the NTK regime's results in classifying the XOR distribution.
arXiv Detail & Related papers (2024-10-13T21:49:29Z) - Bayes-optimal learning of an extensive-width neural network from quadratically many samples [28.315491743569897]
We consider the problem of learning a target function corresponding to a single hidden layer neural network.
We consider the limit where the input dimension and the network width are proportionally large.
arXiv Detail & Related papers (2024-08-07T12:41:56Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Sampling weights of deep neural networks [1.2370077627846041]
We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks.
In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed.
We prove that sampled networks are universal approximators.
arXiv Detail & Related papers (2023-06-29T10:13:36Z) - Collapsed Inference for Bayesian Deep Learning [36.1725075097107]
We introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples.
A collapsed sample represents uncountably many models drawn from the approximate posterior.
Our proposed use of collapsed samples achieves a balance between scalability and accuracy.
arXiv Detail & Related papers (2023-06-16T08:34:42Z) - Nonparametric regression using over-parameterized shallow ReLU neural networks [10.339057554827392]
We show that neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes.
It is assumed that the regression function is from the H"older space with smoothness $alpha(d+3)/2$ or a variation space corresponding to shallow neural networks.
As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks.
arXiv Detail & Related papers (2023-06-14T07:42:37Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.