Volumization as a Natural Generalization of Weight Decay
- URL: http://arxiv.org/abs/2003.11243v2
- Date: Wed, 1 Apr 2020 10:05:45 GMT
- Title: Volumization as a Natural Generalization of Weight Decay
- Authors: Liu Ziyin, Zihao Wang, Makoto Yamada, Masahito Ueda
- Abstract summary: Inspired by physics, we define a physical volume for the weight parameters in neural networks.
We show that this method is an effective way of regularizing neural networks.
- Score: 25.076488081589403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel regularization method, called \textit{volumization}, for
neural networks. Inspired by physics, we define a physical volume for the
weight parameters in neural networks, and we show that this method is an
effective way of regularizing neural networks. Intuitively, this method
interpolates between an $L_2$ and $L_\infty$ regularization. Therefore, weight
decay and weight clipping become special cases of the proposed algorithm. We
prove, on a toy example, that the essence of this method is a regularization
technique to control bias-variance tradeoff. The method is shown to do well in
the categories where the standard weight decay method is shown to work well,
including improving the generalization of networks and preventing memorization.
Moreover, we show that the volumization might lead to a simple method for
training a neural network whose weight is binary or ternary.
Related papers
- Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis [5.016205338484259]
The proposed method is more robust to network size variations than the existing method.
When applied to Physics-Informed Neural Networks, the method exhibits faster convergence and robustness to variations of the network size.
arXiv Detail & Related papers (2024-10-03T06:30:27Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Weight Compander: A Simple Weight Reparameterization for Regularization [5.744133015573047]
We introduce weight compander, a novel effective method to improve generalization of deep neural networks.
We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
arXiv Detail & Related papers (2023-06-29T14:52:04Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Understanding Weight Similarity of Neural Networks via Chain
Normalization Rule and Hypothesis-Training-Testing [58.401504709365284]
We present a weight similarity measure that can quantify the weight similarity of non-volution neural networks.
We first normalize the weights of neural networks by a chain normalization rule, which is used to introduce weight-training representation learning.
We extend traditional hypothesis-testing method to validate the hypothesis on the weight similarity of neural networks.
arXiv Detail & Related papers (2022-08-08T19:11:03Z) - Training Sparse Neural Networks using Compressed Sensing [13.84396596420605]
We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step.
Specifically, we utilize an adaptively weighted $ell1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks.
arXiv Detail & Related papers (2020-08-21T19:35:54Z) - Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions.
We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z) - Constraint-Based Regularization of Neural Networks [0.0]
We propose a method for efficiently incorporating constraints into a gradient Langevin framework for the training of deep neural networks.
Appropriately designed, they reduce the vanishing/exploding gradient problem, control weight magnitudes and stabilize deep neural networks.
arXiv Detail & Related papers (2020-06-17T19:28:41Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Fiedler Regularization: Learning Neural Networks with Graph Sparsity [6.09170287691728]
We introduce a novel regularization approach for deep learning that incorporates and respects the underlying graphical structure of the neural network.
We propose to use the Fiedler value of the neural network's underlying graph as a tool for regularization.
arXiv Detail & Related papers (2020-03-02T16:19:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.