Related papers: Weight Compander: A Simple Weight Reparameterization for Regularization

Weight Compander: A Simple Weight Reparameterization for Regularization

URL: http://arxiv.org/abs/2306.16993v1
Date: Thu, 29 Jun 2023 14:52:04 GMT
Title: Weight Compander: A Simple Weight Reparameterization for Regularization
Authors: Rinor Cakaj, Jens Mehnert, Bin Yang
Abstract summary: We introduce weight compander, a novel effective method to improve generalization of deep neural networks. We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
Score: 5.744133015573047
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Regularization is a set of techniques that are used to improve the generalization ability of deep neural networks. In this paper, we introduce weight compander (WC), a novel effective method to improve generalization by reparameterizing each weight in deep neural networks using a nonlinear function. It is a general, intuitive, cheap and easy to implement method, which can be combined with various other regularization techniques. Large weights in deep neural networks are a sign of a more complex network that is overfitted to the training data. Moreover, regularized networks tend to have a greater range of weights around zero with fewer weights centered at zero. We introduce a weight reparameterization function which is applied to each weight and implicitly reduces overfitting by restricting the magnitude of the weights while forcing them away from zero at the same time. This leads to a more democratic decision-making in the network. Firstly, individual weights cannot have too much influence in the prediction process due to the restriction of their magnitude. Secondly, more weights are used in the prediction process, since they are forced away from zero during the training. This promotes the extraction of more features from the input data and increases the level of weight redundancy, which makes the network less sensitive to statistical differences between training and test data. We extend our method to learn the hyperparameters of the introduced weight reparameterization function. This avoids hyperparameter search and gives the network the opportunity to align the weight reparameterization with the training progress. We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.

Related papers

Improved Generalization of Weight Space Networks via Augmentations [53.87011906358727]
Learning in deep weight spaces (DWS) is an emerging research direction, with applications to 2D and 3D neural fields (INRs, NeRFs) We empirically analyze the reasons for this overfitting and find that a key reason is the lack of diversity in DWS datasets. To address this, we explore strategies for data augmentation in weight spaces and propose a MixUp method adapted for weight spaces.
arXiv Detail & Related papers (2024-02-06T15:34:44Z)
Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight. We develop an improved KL metric to determine optimal quantization scales for activation. For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z)
HyperSparse Neural Networks: Shifting Exploration to Exploitation through Adaptive Regularization [18.786142528591355]
Sparse neural networks are a key factor in developing resource-efficient machine learning applications. We propose the novel and powerful sparse learning method Adaptive Regularized Training (ART) to compress dense into sparse networks. Our method compresses the pre-trained model knowledge into the weights of highest magnitude.
arXiv Detail & Related papers (2023-08-14T14:18:11Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
Understanding Weight Similarity of Neural Networks via Chain Normalization Rule and Hypothesis-Training-Testing [58.401504709365284]
We present a weight similarity measure that can quantify the weight similarity of non-volution neural networks. We first normalize the weights of neural networks by a chain normalization rule, which is used to introduce weight-training representation learning. We extend traditional hypothesis-testing method to validate the hypothesis on the weight similarity of neural networks.
arXiv Detail & Related papers (2022-08-08T19:11:03Z)
Training Sparse Neural Networks using Compressed Sensing [13.84396596420605]
We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step. Specifically, we utilize an adaptively weighted $ell1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks.
arXiv Detail & Related papers (2020-08-21T19:35:54Z)
Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions. We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z)
Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z)
Training highly effective connectivities within neural networks with randomly initialized, fixed weights [4.56877715768796]
We introduce a novel way of training a network by flipping the signs of the weights. We obtain good results even with weights constant magnitude or even when weights are drawn from highly asymmetric distributions.
arXiv Detail & Related papers (2020-06-30T09:41:18Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Train-by-Reconnect: Decoupling Locations of Weights from their Values [6.09170287691728]
We show that untrained deep neural networks (DNNs) are different from trained ones. We propose a novel method named Lookahead Permutation (LaPerm) to train DNNs by reconnecting the weights. When the initial weights share a single value, our method finds weight neural network with far better-than-chance accuracy.
arXiv Detail & Related papers (2020-03-05T12:40:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.