Proxy-Normalizing Activations to Match Batch Normalization while
Removing Batch Dependence
- URL: http://arxiv.org/abs/2106.03743v1
- Date: Mon, 7 Jun 2021 16:08:48 GMT
- Title: Proxy-Normalizing Activations to Match Batch Normalization while
Removing Batch Dependence
- Authors: Antoine Labatie, Dominic Masters, Zach Eaton-Rosen, Carlo Luschi
- Abstract summary: We find that layer normalization and instance normalization both induce the appearance of failure modes in the neural network's pre-activations.
We introduce the technique " Proxy Normalization" that normalizes post-activations using a proxy distribution.
When combined with layer normalization or group normalization, this batch-independent normalization emulates batch normalization's behavior and consistently matches or exceeds its performance.
- Score: 8.411385346896413
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We investigate the reasons for the performance degradation incurred with
batch-independent normalization. We find that the prototypical techniques of
layer normalization and instance normalization both induce the appearance of
failure modes in the neural network's pre-activations: (i) layer normalization
induces a collapse towards channel-wise constant functions; (ii) instance
normalization induces a lack of variability in instance statistics, symptomatic
of an alteration of the expressivity. To alleviate failure mode (i) without
aggravating failure mode (ii), we introduce the technique "Proxy Normalization"
that normalizes post-activations using a proxy distribution. When combined with
layer normalization or group normalization, this batch-independent
normalization emulates batch normalization's behavior and consistently matches
or exceeds its performance.
Related papers
- Unsupervised Adaptive Normalization [0.07499722271664146]
Unsupervised Adaptive Normalization (UAN) is an innovative algorithm that seamlessly integrates clustering for normalization with deep neural network learning.
UAN outperforms the classical methods by adapting to the target task and is effective in classification, and domain adaptation.
arXiv Detail & Related papers (2024-09-07T08:14:11Z) - GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection [60.78684630040313]
Diffusion models tend to reconstruct normal counterparts of test images with certain noises added.
From the global perspective, the difficulty of reconstructing images with different anomalies is uneven.
We propose a global and local adaptive diffusion model (abbreviated to GLAD) for unsupervised anomaly detection.
arXiv Detail & Related papers (2024-06-11T17:27:23Z) - AFN: Adaptive Fusion Normalization via an Encoder-Decoder Framework [6.293148047652131]
We propose a new normalization function called Adaptive Fusion Normalization.
Through experiments, we demonstrate AFN outperforms the previous normalization techniques in domain generalization and image classification tasks.
arXiv Detail & Related papers (2023-08-07T06:08:51Z) - The Implicit Bias of Batch Normalization in Linear Models and Two-layer
Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate.
We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z) - Context Normalization Layer with Applications [0.1499944454332829]
This study proposes a new normalization technique, called context normalization, for image data.
It adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance.
The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques.
arXiv Detail & Related papers (2023-03-14T06:38:17Z) - AltUB: Alternating Training Method to Update Base Distribution of
Normalizing Flow for Anomaly Detection [1.3999481573773072]
Unsupervised anomaly detection is coming into the spotlight these days in various practical domains.
One of the major approaches for it is a normalizing flow which pursues the invertible transformation of a complex distribution as images into an easy distribution as N(0, I)
arXiv Detail & Related papers (2022-10-26T16:31:15Z) - Distribution Mismatch Correction for Improved Robustness in Deep Neural
Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions.
We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer.
In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Pruning Redundant Mappings in Transformer Models via Spectral-Normalized
Identity Prior [54.629850694790036]
spectral-normalized identity priors (SNIP) is a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping.
We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance.
arXiv Detail & Related papers (2020-10-05T05:40:56Z) - Optimization Theory for ReLU Neural Networks Trained with Normalization
Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers.
Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.