Theroretical Insight into Batch Normalization: Data Dependant
Auto-Tuning of Regularization Rate
- URL: http://arxiv.org/abs/2209.07587v1
- Date: Thu, 15 Sep 2022 19:51:02 GMT
- Title: Theroretical Insight into Batch Normalization: Data Dependant
Auto-Tuning of Regularization Rate
- Authors: Lakshmi Annamalai and Chetan Singh Thakur
- Abstract summary: Batch normalization is widely used in deep learning to normalize intermediate activations.
This paper brings out the data-dependent auto-tuning of the regularization parameter by textbfBN with analytical proofs.
- Score: 1.6447597767676658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Batch normalization is widely used in deep learning to normalize intermediate
activations. Deep networks suffer from notoriously increased training
complexity, mandating careful initialization of weights, requiring lower
learning rates, etc. These issues have been addressed by Batch Normalization
(\textbf{BN}), by normalizing the inputs of activations to zero mean and unit
standard deviation. Making this batch normalization part of the training
process dramatically accelerates the training process of very deep networks. A
new field of research has been going on to examine the exact theoretical
explanation behind the success of \textbf{BN}. Most of these theoretical
insights attempt to explain the benefits of \textbf{BN} by placing them on its
influence on optimization, weight scale invariance, and regularization. Despite
\textbf{BN} undeniable success in accelerating generalization, the gap of
analytically relating the effect of \textbf{BN} to the regularization parameter
is still missing. This paper aims to bring out the data-dependent auto-tuning
of the regularization parameter by \textbf{BN} with analytical proofs. We have
posed \textbf{BN} as a constrained optimization imposed on non-\textbf{BN}
weights through which we demonstrate its data statistics dependant auto-tuning
of regularization parameter. We have also given analytical proof for its
behavior under a noisy input scenario, which reveals the signal vs. noise
tuning of the regularization parameter. We have also substantiated our claim
with empirical results from the MNIST dataset experiments.
Related papers
- Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - An Adaptive Batch Normalization in Deep Learning [0.0]
Batch Normalization (BN) is a way to accelerate and stabilize training in deep convolutional neural networks.
We propose a threshold-based adaptive BN approach that separates the data that requires the BN and data that does not require it.
arXiv Detail & Related papers (2022-11-03T12:12:56Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Revisiting Batch Normalization [0.0]
Batch normalization (BN) is essential for training deep neural networks.
We revisit the BN formulation and present a new method and update approach for BN to address the aforementioned issues.
Experimental results using the proposed alterations to BN show statistically significant performance gains in a variety of scenarios.
We also present a new online BN-based input data normalization technique to alleviate the need for other offline or fixed methods.
arXiv Detail & Related papers (2021-10-26T19:48:19Z) - Explicit regularization and implicit bias in deep network classifiers
trained with the square loss [2.8935588665357077]
Deep ReLU networks trained with the square loss have been observed to perform well in classification tasks.
We show that convergence to a solution with the absolute minimum norm is expected when normalization techniques are used together with Weight Decay.
arXiv Detail & Related papers (2020-12-31T21:07:56Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN)
LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks.
We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.