Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations
- URL: http://arxiv.org/abs/2008.02965v2
- Date: Wed, 8 Jun 2022 14:17:20 GMT
- Title: Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations
- Authors: Ziquan Liu, Yufei Cui, Antoni B. Chan
- Abstract summary: We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions.
We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
- Score: 52.493315075385325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using weight decay to penalize the L2 norms of weights in neural networks has
been a standard training practice to regularize the complexity of networks. In
this paper, we show that a family of regularizers, including weight decay, is
ineffective at penalizing the intrinsic norms of weights for networks with
positively homogeneous activation functions, such as linear, ReLU and
max-pooling functions. As a result of homogeneity, functions specified by the
networks are invariant to the shifting of weight scales between layers. The
ineffective regularizers are sensitive to such shifting and thus poorly
regularize the model capacity, leading to overfitting. To address this
shortcoming, we propose an improved regularizer that is invariant to weight
scale shifting and thus effectively constrains the intrinsic norm of a neural
network. The derived regularizer is an upper bound for the input gradient of
the network so minimizing the improved regularizer also benefits the
adversarial robustness. Residual connections are also considered and we show
that our regularizer also forms an upper bound to input gradients of such a
residual network. We demonstrate the efficacy of our proposed regularizer on
various datasets and neural network architectures at improving generalization
and adversarial robustness.
Related papers
- Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z) - Logit Attenuating Weight Normalization [5.856897366207895]
deep networks trained using gradient-based generalizations are a popular choice for solving classification and ranking problems.
Without appropriately tuned $ell$ regularization or weight decay, such networks have the tendency to make output scores (logits) and network weights large.
We propose a method called Logituating Weight Normalization (LAWN), that can be stacked onto any gradient-based generalization.
arXiv Detail & Related papers (2021-08-12T16:44:24Z) - Better Training using Weight-Constrained Stochastic Dynamics [0.0]
We employ constraints to control the parameter space of deep neural networks throughout training.
The use of customized, appropriately designed constraints can reduce the vanishing/exploding problem.
We provide a general approach to efficiently incorporate constraints into a gradient Langevin framework.
arXiv Detail & Related papers (2021-06-20T14:41:06Z) - Rethinking Skip Connection with Layer Normalization in Transformers and
ResNets [49.87919454950763]
Skip connection is a widely-used technique to improve the performance of deep neural networks.
In this work, we investigate how the scale factors in the effectiveness of the skip connection.
arXiv Detail & Related papers (2021-05-15T11:44:49Z) - The Implicit Biases of Stochastic Gradient Descent on Deep Neural
Networks with Batch Normalization [44.30960913470372]
Deep neural networks with batch normalization (BN-DNNs) are invariant to weight rescaling due to their normalization operations.
We investigate the implicit biases of gradient descent (SGD) on BN-DNNs to provide a theoretical explanation for the efficacy of weight decay.
arXiv Detail & Related papers (2021-02-06T03:40:20Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.