WeightAlign: Normalizing Activations by Weight Alignment
- URL: http://arxiv.org/abs/2010.07160v1
- Date: Wed, 14 Oct 2020 15:25:39 GMT
- Title: WeightAlign: Normalizing Activations by Weight Alignment
- Authors: Xiangwei Shi, Yunqiang Li, Xin Liu, Jan van Gemert
- Abstract summary: Batch normalization (BN) allows training very deep networks by normalizing activations by mini-batch sample statistics.
Such methods are less stable than BN as they critically depend on the statistics of a single input sample.
We present WeightAlign: a method that normalizes the weights by the mean and scaled standard derivation computed within a filter, which normalizes activations without computing any sample statistics.
- Score: 16.85286948260155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch normalization (BN) allows training very deep networks by normalizing
activations by mini-batch sample statistics which renders BN unstable for small
batch sizes. Current small-batch solutions such as Instance Norm, Layer Norm,
and Group Norm use channel statistics which can be computed even for a single
sample. Such methods are less stable than BN as they critically depend on the
statistics of a single input sample. To address this problem, we propose a
normalization of activation without sample statistics. We present WeightAlign:
a method that normalizes the weights by the mean and scaled standard derivation
computed within a filter, which normalizes activations without computing any
sample statistics. Our proposed method is independent of batch size and stable
over a wide range of batch sizes. Because weight statistics are orthogonal to
sample statistics, we can directly combine WeightAlign with any method for
activation normalization. We experimentally demonstrate these benefits for
classification on CIFAR-10, CIFAR-100, ImageNet, for semantic segmentation on
PASCAL VOC 2012 and for domain adaptation on Office-31.
Related papers
- Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Un-Mixing Test-Time Normalization Statistics: Combatting Label Temporal Correlation [11.743315123714108]
This paper presents a novel method termed 'Un-Mixing Test-Time Normalization Statistics' (UnMix-TNS)
Our method re-calibrates the statistics for each instance within a test batch by mixing it with multiple distinct statistics components.
Our results highlight UnMix-TNS's capacity to markedly enhance stability and performance across various benchmarks.
arXiv Detail & Related papers (2024-01-16T12:48:52Z) - TTN: A Domain-Shift Aware Batch Normalization in Test-Time Adaptation [28.63285970880039]
Recent test-time adaptation methods heavily rely on transductive batch normalization (TBN)
Adopting TBN that employs test batch statistics mitigates the performance degradation caused by the domain shift.
We present a new test-time normalization (TTN) method that interpolates the statistics by adjusting the importance between CBN and TBN according to the domain-shift sensitivity of each BN layer.
arXiv Detail & Related papers (2023-02-10T10:25:29Z) - Batch Layer Normalization, A new normalization layer for CNNs and RNN [0.0]
This study introduces a new normalization layer termed Batch Layer Normalization (BLN)
As a combined version of batch and layer normalization, BLN adaptively puts appropriate weight on mini-batch and feature normalization based on the inverse size of mini-batches.
Test results indicate the application potential of BLN and its faster convergence than batch normalization and layer normalization in both Convolutional and Recurrent Neural Networks.
arXiv Detail & Related papers (2022-09-19T10:12:51Z) - BR-SNIS: Bias Reduced Self-Normalized Importance Sampling [11.150337082767862]
Importance Sampling (IS) is a method for approximating expectations under a target distribution using independent samples from a proposal distribution and the associated importance weights.
We propose a new method, BR-SNIS, whose complexity is essentially the same as that of SNIS and which significantly reduces bias without increasing the variance.
We furnish the proposed algorithm with rigorous theoretical results, including new bias, variance and high-probability bounds.
arXiv Detail & Related papers (2022-07-13T17:14:10Z) - Test-time Batch Statistics Calibration for Covariate Shift [66.7044675981449]
We propose to adapt the deep models to the novel environment during inference.
We present a general formulation $alpha$-BN to calibrate the batch statistics.
We also present a novel loss function to form a unified test time adaptation framework Core.
arXiv Detail & Related papers (2021-10-06T08:45:03Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN)
LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks.
We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z) - Cross-Iteration Batch Normalization [67.83430009388678]
We present Cross-It Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality.
CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique.
arXiv Detail & Related papers (2020-02-13T18:52:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.