Weight and Gradient Centralization in Deep Neural Networks
- URL: http://arxiv.org/abs/2010.00866v3
- Date: Sun, 17 Jan 2021 12:05:14 GMT
- Title: Weight and Gradient Centralization in Deep Neural Networks
- Authors: Wolfgang Fuhl, Enkelejda Kasneci
- Abstract summary: Batch normalization is currently the most widely used variant of internal normalization for deep neural networks.
In this work, we combine several of these methods and thereby increase the generalization of the networks.
- Score: 13.481518628796692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch normalization is currently the most widely used variant of internal
normalization for deep neural networks. Additional work has shown that the
normalization of weights and additional conditioning as well as the
normalization of gradients further improve the generalization. In this work, we
combine several of these methods and thereby increase the generalization of the
networks. The advantage of the newer methods compared to the batch
normalization is not only increased generalization, but also that these methods
only have to be applied during training and, therefore, do not influence the
running time during use. Link to CUDA code
https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/
Related papers
- Context Normalization Layer with Applications [0.1499944454332829]
This study proposes a new normalization technique, called context normalization, for image data.
It adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance.
The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques.
arXiv Detail & Related papers (2023-03-14T06:38:17Z) - When Does Re-initialization Work? [50.70297319284022]
Re-initialization has been observed to improve generalization in recent works.
It is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols.
This raises the question of when re-initialization works, and whether it should be used together with regularization techniques.
arXiv Detail & Related papers (2022-06-20T21:23:15Z) - Training Thinner and Deeper Neural Networks: Jumpstart Regularization [2.8348950186890467]
We use regularization to prevent neurons from dying or becoming linear.
In comparison to conventional training, we obtain neural networks that are thinner, deeper, and - most importantly - more parameter-efficient.
arXiv Detail & Related papers (2022-01-30T12:11:24Z) - Comparing Normalization Methods for Limited Batch Size Segmentation
Neural Networks [0.0]
Batch Normalization works best using large batch size during training.
We show the effectiveness of Instance Normalization in the limited batch size neural network training environment.
We also show that the Instance Normalization implementation used in this experiment is computational time efficient when compared to the network without any normalization method.
arXiv Detail & Related papers (2020-11-23T17:13:24Z) - Normalization Techniques in Training DNNs: Methodology, Analysis and
Application [111.82265258916397]
Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs)
This paper reviews and comments on the past, present and future of normalization methods in the context of training.
arXiv Detail & Related papers (2020-09-27T13:06:52Z) - Training Deep Neural Networks Without Batch Normalization [4.266320191208303]
This work studies batch normalization in detail, while comparing it with other methods such as weight normalization, gradient clipping and dropout.
The main purpose of this work is to determine if it is possible to train networks effectively when batch normalization is removed through adaption of the training process.
arXiv Detail & Related papers (2020-08-18T15:04:40Z) - Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions.
We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z) - Optimization Theory for ReLU Neural Networks Trained with Normalization
Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers.
Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z) - Gradient Centralization: A New Optimization Technique for Deep Neural
Networks [74.935141515523]
gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean.
GC can be viewed as a projected gradient descent method with a constrained loss function.
GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
arXiv Detail & Related papers (2020-04-03T10:25:00Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.