Related papers: Weight and Gradient Centralization in Deep Neural Networks

Weight and Gradient Centralization in Deep Neural Networks

URL: http://arxiv.org/abs/2010.00866v3
Date: Sun, 17 Jan 2021 12:05:14 GMT
Title: Weight and Gradient Centralization in Deep Neural Networks
Authors: Wolfgang Fuhl, Enkelejda Kasneci
Abstract summary: Batch normalization is currently the most widely used variant of internal normalization for deep neural networks. In this work, we combine several of these methods and thereby increase the generalization of the networks.
Score: 13.481518628796692
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Batch normalization is currently the most widely used variant of internal normalization for deep neural networks. Additional work has shown that the normalization of weights and additional conditioning as well as the normalization of gradients further improve the generalization. In this work, we combine several of these methods and thereby increase the generalization of the networks. The advantage of the newer methods compared to the batch normalization is not only increased generalization, but also that these methods only have to be applied during training and, therefore, do not influence the running time during use. Link to CUDA code https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/

Related papers

Context Normalization Layer with Applications [0.1499944454332829]
This study proposes a new normalization technique, called context normalization, for image data. It adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance. The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques.
arXiv Detail & Related papers (2023-03-14T06:38:17Z)
When Does Re-initialization Work? [50.70297319284022]
Re-initialization has been observed to improve generalization in recent works. It is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques.
arXiv Detail & Related papers (2022-06-20T21:23:15Z)
Training Thinner and Deeper Neural Networks: Jumpstart Regularization [2.8348950186890467]
We use regularization to prevent neurons from dying or becoming linear. In comparison to conventional training, we obtain neural networks that are thinner, deeper, and - most importantly - more parameter-efficient.
arXiv Detail & Related papers (2022-01-30T12:11:24Z)
Comparing Normalization Methods for Limited Batch Size Segmentation Neural Networks [0.0]
Batch Normalization works best using large batch size during training. We show the effectiveness of Instance Normalization in the limited batch size neural network training environment. We also show that the Instance Normalization implementation used in this experiment is computational time efficient when compared to the network without any normalization method.
arXiv Detail & Related papers (2020-11-23T17:13:24Z)
Normalization Techniques in Training DNNs: Methodology, Analysis and Application [111.82265258916397]
Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs) This paper reviews and comments on the past, present and future of normalization methods in the context of training.
arXiv Detail & Related papers (2020-09-27T13:06:52Z)
Training Deep Neural Networks Without Batch Normalization [4.266320191208303]
This work studies batch normalization in detail, while comparing it with other methods such as weight normalization, gradient clipping and dropout. The main purpose of this work is to determine if it is possible to train networks effectively when batch normalization is removed through adaption of the training process.
arXiv Detail & Related papers (2020-08-18T15:04:40Z)
Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions. We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z)
Optimization Theory for ReLU Neural Networks Trained with Normalization Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers. Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z)
Normalized Convolutional Neural Network [3.9686028140278897]
We introduce a Normalized Convolutional Neural Layer, a novel approach to normalization in convolutional networks. This layer normalizes the rows of the im2col matrix during convolution, making it inherently adaptive to sliced inputs and better aligned with kernel structures.
arXiv Detail & Related papers (2020-05-11T17:20:26Z)
Gradient Centralization: A New Optimization Technique for Deep Neural Networks [74.935141515523]
gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
arXiv Detail & Related papers (2020-04-03T10:25:00Z)
Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.