Improving Generalization of Batch Whitening by Convolutional Unit
Optimization
- URL: http://arxiv.org/abs/2108.10629v1
- Date: Tue, 24 Aug 2021 10:27:57 GMT
- Title: Improving Generalization of Batch Whitening by Convolutional Unit
Optimization
- Authors: Yooshin Cho, Hanbyel Cho, Youngsoo Kim, Junmo Kim
- Abstract summary: Batch Whitening is a technique that accelerates and stabilizes training by transforming input features to have a zero mean (Centering) and a unit variance (Scaling)
In commonly used structures, which are empirically optimized with Batch Normalization, the normalization layer appears between convolution and activation function.
We propose a new Convolutional Unit that is in line with the theory, and our method generally improves the performance of Batch Whitening.
- Score: 24.102442375834084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch Whitening is a technique that accelerates and stabilizes training by
transforming input features to have a zero mean (Centering) and a unit variance
(Scaling), and by removing linear correlation between channels (Decorrelation).
In commonly used structures, which are empirically optimized with Batch
Normalization, the normalization layer appears between convolution and
activation function. Following Batch Whitening studies have employed the same
structure without further analysis; even Batch Whitening was analyzed on the
premise that the input of a linear layer is whitened. To bridge the gap, we
propose a new Convolutional Unit that is in line with the theory, and our
method generally improves the performance of Batch Whitening. Moreover, we show
the inefficacy of the original Convolutional Unit by investigating rank and
correlation of features. As our method is employable off-the-shelf whitening
modules, we use Iterative Normalization (IterNorm), the state-of-the-art
whitening module, and obtain significantly improved performance on five image
classification datasets: CIFAR-10, CIFAR-100, CUB-200-2011, Stanford Dogs, and
ImageNet. Notably, we verify that our method improves stability and performance
of whitening when using large learning rate, group size, and iteration number.
Related papers
- Covariance-corrected Whitening Alleviates Network Degeneration on Imbalanced Classification [6.197116272789107]
Class imbalance is a critical issue in image classification that significantly affects the performance of deep recognition models.
We propose a novel framework called Whitening-Net to mitigate the degenerate solutions.
In scenarios with extreme class imbalance, the batch covariance statistic exhibits significant fluctuations, impeding the convergence of the whitening operation.
arXiv Detail & Related papers (2024-08-30T10:49:33Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Spectrum Extraction and Clipping for Implicitly Linear Layers [20.277446818410997]
We show the effectiveness of automatic differentiation in efficiently and correctly computing and controlling the spectrum of implicitly linear operators.
We provide the first clipping method which is correct for general convolution layers.
arXiv Detail & Related papers (2024-02-25T07:28:28Z) - Whitening-based Contrastive Learning of Sentence Embeddings [61.38955786965527]
This paper presents a whitening-based contrastive learning method for sentence embedding learning (WhitenedCSE)
We find that these two approaches are not totally redundant but actually have some complementarity due to different uniformity mechanism.
arXiv Detail & Related papers (2023-05-28T14:58:10Z) - Optimal Input Gain: All You Need to Supercharge a Feed-Forward Neural
Network [0.6562256987706128]
It is shown that pre-processing inputs using linear transformation are equivalent to multiplying the negative gradient matrix with an autocorrelation matrix per training iteration.
It is shown that OIG improved HWO could be a significant building block to more complex deep learning architectures.
arXiv Detail & Related papers (2023-03-30T22:20:16Z) - Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence
Embedding [51.48582649050054]
We propose a representation normalization method which aims at disentangling the correlations between features of encoded sentences.
We also propose Kernel-Whitening, a Nystrom kernel approximation method to achieve more thorough debiasing on nonlinear spurious correlations.
Experiments show that Kernel-Whitening significantly improves the performance of BERT on out-of-distribution datasets while maintaining in-distribution accuracy.
arXiv Detail & Related papers (2022-10-14T05:56:38Z) - Stochastic Whitening Batch Normalization [9.514475896906605]
Batch Normalization (BN) is a popular technique for training Deep Neural Networks (DNNs)
The recently proposed Iterative Normalization (IterNorm) method improves these properties by whitening the activations iteratively using Newton's method.
We show that while SWBN improves convergence rate and generalization, its computational overhead is less than that of IterNorm.
arXiv Detail & Related papers (2021-06-03T20:45:42Z) - Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning.
In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z) - Feature Whitening via Gradient Transformation for Improved Convergence [3.5579740292581]
We address the complexity drawbacks of feature whitening.
We derive an equivalent method, which replaces the sample transformations by a transformation to the weight gradients, applied to every batch of B samples.
We exemplify the proposed algorithms with ResNet-based networks for image classification demonstrated on the CIFAR and Imagenet datasets.
arXiv Detail & Related papers (2020-10-04T11:30:20Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - An Investigation into the Stochasticity of Batch Whitening [95.54842420166862]
This paper investigates the more general Batch Whitening (BW) operation.
We show that while various whitening transformations equivalently improve the conditioning, they show significantly different behaviors in discriminative scenarios and training Generative Adrial Networks (GAN)
Our proposed BW algorithm improves the residual networks by a significant margin on ImageNetversaity.
arXiv Detail & Related papers (2020-03-27T11:06:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.