Related papers: Volumization as a Natural Generalization of Weight Decay

Volumization as a Natural Generalization of Weight Decay

URL: http://arxiv.org/abs/2003.11243v2
Date: Wed, 1 Apr 2020 10:05:45 GMT
Title: Volumization as a Natural Generalization of Weight Decay
Authors: Liu Ziyin, Zihao Wang, Makoto Yamada, Masahito Ueda
Abstract summary: Inspired by physics, we define a physical volume for the weight parameters in neural networks. We show that this method is an effective way of regularizing neural networks.
Score: 25.076488081589403
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel regularization method, called \textit{volumization}, for neural networks. Inspired by physics, we define a physical volume for the weight parameters in neural networks, and we show that this method is an effective way of regularizing neural networks. Intuitively, this method interpolates between an $L_2$ and $L_\infty$ regularization. Therefore, weight decay and weight clipping become special cases of the proposed algorithm. We prove, on a toy example, that the essence of this method is a regularization technique to control bias-variance tradeoff. The method is shown to do well in the categories where the standard weight decay method is shown to work well, including improving the generalization of networks and preventing memorization. Moreover, we show that the volumization might lead to a simple method for training a neural network whose weight is binary or ternary.

Related papers

Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis [5.016205338484259]
The proposed method is more robust to network size variations than the existing method. When applied to Physics-Informed Neural Networks, the method exhibits faster convergence and robustness to variations of the network size.
arXiv Detail & Related papers (2024-10-03T06:30:27Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Weight Compander: A Simple Weight Reparameterization for Regularization [5.744133015573047]
We introduce weight compander, a novel effective method to improve generalization of deep neural networks. We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
arXiv Detail & Related papers (2023-06-29T14:52:04Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Understanding Weight Similarity of Neural Networks via Chain Normalization Rule and Hypothesis-Training-Testing [58.401504709365284]
We present a weight similarity measure that can quantify the weight similarity of non-volution neural networks. We first normalize the weights of neural networks by a chain normalization rule, which is used to introduce weight-training representation learning. We extend traditional hypothesis-testing method to validate the hypothesis on the weight similarity of neural networks.
arXiv Detail & Related papers (2022-08-08T19:11:03Z)
Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures [0.0]
We propose a method for learning sparse neural topologies via a regularization technique which identifies non relevant weights and selectively shrinks their norm. We tested the proposed technique on different image classification and Natural language generation tasks, obtaining results on par or better then competitors in terms of sparsity and metrics.
arXiv Detail & Related papers (2022-04-11T09:44:16Z)
Training Sparse Neural Networks using Compressed Sensing [13.84396596420605]
We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step. Specifically, we utilize an adaptively weighted $ell1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks.
arXiv Detail & Related papers (2020-08-21T19:35:54Z)
Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions. We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z)
Constraint-Based Regularization of Neural Networks [0.0]
We propose a method for efficiently incorporating constraints into a gradient Langevin framework for the training of deep neural networks. Appropriately designed, they reduce the vanishing/exploding gradient problem, control weight magnitudes and stabilize deep neural networks.
arXiv Detail & Related papers (2020-06-17T19:28:41Z)
Feature Purification: How Adversarial Training Performs Robust Deep Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Fiedler Regularization: Learning Neural Networks with Graph Sparsity [6.09170287691728]
We introduce a novel regularization approach for deep learning that incorporates and respects the underlying graphical structure of the neural network. We propose to use the Fiedler value of the neural network's underlying graph as a tool for regularization.
arXiv Detail & Related papers (2020-03-02T16:19:33Z)
A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior. This implies that the training loss converges linearly up to a certain accuracy. We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.