Related papers: Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

URL: http://arxiv.org/abs/2106.03743v1
Date: Mon, 7 Jun 2021 16:08:48 GMT
Title: Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence
Authors: Antoine Labatie, Dominic Masters, Zach Eaton-Rosen, Carlo Luschi
Abstract summary: We find that layer normalization and instance normalization both induce the appearance of failure modes in the neural network's pre-activations. We introduce the technique " Proxy Normalization" that normalizes post-activations using a proxy distribution. When combined with layer normalization or group normalization, this batch-independent normalization emulates batch normalization's behavior and consistently matches or exceeds its performance.
Score: 8.411385346896413
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We investigate the reasons for the performance degradation incurred with batch-independent normalization. We find that the prototypical techniques of layer normalization and instance normalization both induce the appearance of failure modes in the neural network's pre-activations: (i) layer normalization induces a collapse towards channel-wise constant functions; (ii) instance normalization induces a lack of variability in instance statistics, symptomatic of an alteration of the expressivity. To alleviate failure mode (i) without aggravating failure mode (ii), we introduce the technique "Proxy Normalization" that normalizes post-activations using a proxy distribution. When combined with layer normalization or group normalization, this batch-independent normalization emulates batch normalization's behavior and consistently matches or exceeds its performance.

Related papers

Unsupervised Adaptive Normalization [0.07499722271664146]
Unsupervised Adaptive Normalization (UAN) is an innovative algorithm that seamlessly integrates clustering for normalization with deep neural network learning. UAN outperforms the classical methods by adapting to the target task and is effective in classification, and domain adaptation.
arXiv Detail & Related papers (2024-09-07T08:14:11Z)
GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection [60.78684630040313]
Diffusion models tend to reconstruct normal counterparts of test images with certain noises added. From the global perspective, the difficulty of reconstructing images with different anomalies is uneven. We propose a global and local adaptive diffusion model (abbreviated to GLAD) for unsupervised anomaly detection.
arXiv Detail & Related papers (2024-06-11T17:27:23Z)
IN-Flow: Instance Normalization Flow for Non-stationary Time Series Forecasting [38.4809915448213]
We propose a decoupled formulation for time series forecasting with no reliance on fixed statistics. We also propose instance normalization flow (IN-Flow), a novel invertible network for time series transformation.
arXiv Detail & Related papers (2024-01-30T06:35:52Z)
AFN: Adaptive Fusion Normalization via an Encoder-Decoder Framework [6.293148047652131]
We propose a new normalization function called Adaptive Fusion Normalization. Through experiments, we demonstrate AFN outperforms the previous normalization techniques in domain generalization and image classification tasks.
arXiv Detail & Related papers (2023-08-07T06:08:51Z)
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate. We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z)
Context Normalization Layer with Applications [0.1499944454332829]
This study proposes a new normalization technique, called context normalization, for image data. It adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance. The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques.
arXiv Detail & Related papers (2023-03-14T06:38:17Z)
AltUB: Alternating Training Method to Update Base Distribution of Normalizing Flow for Anomaly Detection [1.3999481573773072]
Unsupervised anomaly detection is coming into the spotlight these days in various practical domains. One of the major approaches for it is a normalizing flow which pursues the invertible transformation of a complex distribution as images into an easy distribution as N(0, I)
arXiv Detail & Related papers (2022-10-26T16:31:15Z)
Distribution Mismatch Correction for Improved Robustness in Deep Neural Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions. We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer. In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior [54.629850694790036]
spectral-normalized identity priors (SNIP) is a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping. We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance.
arXiv Detail & Related papers (2020-10-05T05:40:56Z)
Optimization Theory for ReLU Neural Networks Trained with Normalization Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers. Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z)
Normalized Convolutional Neural Network [3.9686028140278897]
We introduce a Normalized Convolutional Neural Layer, a novel approach to normalization in convolutional networks. This layer normalizes the rows of the im2col matrix during convolution, making it inherently adaptive to sliced inputs and better aligned with kernel structures.
arXiv Detail & Related papers (2020-05-11T17:20:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.