Related papers: WeightAlign: Normalizing Activations by Weight Alignment

WeightAlign: Normalizing Activations by Weight Alignment

URL: http://arxiv.org/abs/2010.07160v1
Date: Wed, 14 Oct 2020 15:25:39 GMT
Title: WeightAlign: Normalizing Activations by Weight Alignment
Authors: Xiangwei Shi, Yunqiang Li, Xin Liu, Jan van Gemert
Abstract summary: Batch normalization (BN) allows training very deep networks by normalizing activations by mini-batch sample statistics. Such methods are less stable than BN as they critically depend on the statistics of a single input sample. We present WeightAlign: a method that normalizes the weights by the mean and scaled standard derivation computed within a filter, which normalizes activations without computing any sample statistics.
Score: 16.85286948260155
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Batch normalization (BN) allows training very deep networks by normalizing activations by mini-batch sample statistics which renders BN unstable for small batch sizes. Current small-batch solutions such as Instance Norm, Layer Norm, and Group Norm use channel statistics which can be computed even for a single sample. Such methods are less stable than BN as they critically depend on the statistics of a single input sample. To address this problem, we propose a normalization of activation without sample statistics. We present WeightAlign: a method that normalizes the weights by the mean and scaled standard derivation computed within a filter, which normalizes activations without computing any sample statistics. Our proposed method is independent of batch size and stable over a wide range of batch sizes. Because weight statistics are orthogonal to sample statistics, we can directly combine WeightAlign with any method for activation normalization. We experimentally demonstrate these benefits for classification on CIFAR-10, CIFAR-100, ImageNet, for semantic segmentation on PASCAL VOC 2012 and for domain adaptation on Office-31.

Related papers

SeWA: Selective Weight Average via Probabilistic Masking [51.015724517293236]
We show that only a few points are needed to achieve better and faster convergence. We transform the discrete selection problem into a continuous subset optimization framework. We derive the SeWA's stability bounds, which are sharper than that under both convex image checkpoints.
arXiv Detail & Related papers (2025-02-14T12:35:21Z)
Robust Barycenter Estimation using Semi-Unbalanced Neural Optimal Transport [84.51977664336056]
We propose a novel scalable approach for estimating the robust continuous barycenter. Our method is framed as a min-max optimization problem and is adaptable to general cost functions.
arXiv Detail & Related papers (2024-10-04T23:27:33Z)
Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint. We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b. We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z)
Un-Mixing Test-Time Normalization Statistics: Combatting Label Temporal Correlation [11.743315123714108]
This paper presents a novel method termed 'Un-Mixing Test-Time Normalization Statistics' (UnMix-TNS) Our method re-calibrates the statistics for each instance within a test batch by mixing it with multiple distinct statistics components. Our results highlight UnMix-TNS's capacity to markedly enhance stability and performance across various benchmarks.
arXiv Detail & Related papers (2024-01-16T12:48:52Z)
TTN: A Domain-Shift Aware Batch Normalization in Test-Time Adaptation [28.63285970880039]
Recent test-time adaptation methods heavily rely on transductive batch normalization (TBN) Adopting TBN that employs test batch statistics mitigates the performance degradation caused by the domain shift. We present a new test-time normalization (TTN) method that interpolates the statistics by adjusting the importance between CBN and TBN according to the domain-shift sensitivity of each BN layer.
arXiv Detail & Related papers (2023-02-10T10:25:29Z)
Batch Layer Normalization, A new normalization layer for CNNs and RNN [0.0]
This study introduces a new normalization layer termed Batch Layer Normalization (BLN) As a combined version of batch and layer normalization, BLN adaptively puts appropriate weight on mini-batch and feature normalization based on the inverse size of mini-batches. Test results indicate the application potential of BLN and its faster convergence than batch normalization and layer normalization in both Convolutional and Recurrent Neural Networks.
arXiv Detail & Related papers (2022-09-19T10:12:51Z)
BR-SNIS: Bias Reduced Self-Normalized Importance Sampling [11.150337082767862]
Importance Sampling (IS) is a method for approximating expectations under a target distribution using independent samples from a proposal distribution and the associated importance weights. We propose a new method, BR-SNIS, whose complexity is essentially the same as that of SNIS and which significantly reduces bias without increasing the variance. We furnish the proposed algorithm with rigorous theoretical results, including new bias, variance and high-probability bounds.
arXiv Detail & Related papers (2022-07-13T17:14:10Z)
Test-time Batch Statistics Calibration for Covariate Shift [66.7044675981449]
We propose to adapt the deep models to the novel environment during inference. We present a general formulation $alpha$-BN to calibrate the batch statistics. We also present a novel loss function to form a unified test time adaptation framework Core.
arXiv Detail & Related papers (2021-10-06T08:45:03Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs) We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics. Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z)
PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN) LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks. We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z)
Cross-Iteration Batch Normalization [67.83430009388678]
We present Cross-It Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality. CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique.
arXiv Detail & Related papers (2020-02-13T18:52:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.