Normalization effects on deep neural networks
- URL: http://arxiv.org/abs/2209.01018v1
- Date: Fri, 2 Sep 2022 17:05:55 GMT
- Title: Normalization effects on deep neural networks
- Authors: Jiahui Yu, Konstantinos Spiliopoulos
- Abstract summary: We study the effect of the choice of the $gamma_i$ on the statistical behavior of the neural network's output.
We find that in terms of variance of the neural network's output and test accuracy the best choice is to choose the $gamma_i$s to be equal to one.
- Score: 20.48472873675696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the effect of normalization on the layers of deep neural networks of
feed-forward type. A given layer $i$ with $N_{i}$ hidden units is allowed to be
normalized by $1/N_{i}^{\gamma_{i}}$ with $\gamma_{i}\in[1/2,1]$ and we study
the effect of the choice of the $\gamma_{i}$ on the statistical behavior of the
neural network's output (such as variance) as well as on the test accuracy on
the MNIST data set. We find that in terms of variance of the neural network's
output and test accuracy the best choice is to choose the $\gamma_{i}$'s to be
equal to one, which is the mean-field scaling. We also find that this is
particularly true for the outer layer, in that the neural network's behavior is
more sensitive in the scaling of the outer layer as opposed to the scaling of
the inner layers. The mechanism for the mathematical analysis is an asymptotic
expansion for the neural network's output. An important practical consequence
of the analysis is that it provides a systematic and mathematically informed
way to choose the learning rate hyperparameters. Such a choice guarantees that
the neural network behaves in a statistically robust way as the $N_i$ grow to
infinity.
Related papers
- Neural-g: A Deep Learning Framework for Mixing Density Estimation [16.464806944964003]
Mixing (or prior) density estimation is an important problem in machine learning and statistics.
We propose neural-$g$, a new neural network-based estimator for $g$-modeling.
arXiv Detail & Related papers (2024-06-10T03:00:28Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO
Regularization [15.517787031620864]
The territory of LASSO is extended to two-layer ReLU neural networks, a fashionable and powerful nonlinear regression model.
We show that the LASSO estimator can stably reconstruct the neural network and identify $mathcalSstar$ when the number of samples scales logarithmically.
Our theory lies in an extended Restricted Isometry Property (RIP)-based analysis framework for two-layer ReLU neural networks.
arXiv Detail & Related papers (2023-05-07T13:05:09Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Normalization effects on shallow neural networks and related asymptotic
expansions [20.48472873675696]
In particular, we investigate the effect of different scaling schemes, which lead to different normalizations of the neural network, on the network's statistical output.
We develop an expansion for the neural network's statistical output pointwise with respect to the scaling parameter as the number of hidden units grows to infinity.
We show that to leading order in $N$, the variance of the neural network's statistical output decays as the implied normalization by the scaling parameter approaches the mean field normalization.
arXiv Detail & Related papers (2020-11-20T16:33:28Z) - The Efficacy of $L_1$ Regularization in Two-Layer Neural Networks [36.753907384994704]
A crucial problem in neural networks is to select the most appropriate number of hidden neurons and obtain tight statistical risk bounds.
We show that $L_1$ regularization can control the generalization error and sparsify the input dimension.
An excessively large number of neurons do not necessarily inflate generalization errors under a suitable regularization.
arXiv Detail & Related papers (2020-10-02T15:23:22Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.