Kernel Normalized Convolutional Networks
- URL: http://arxiv.org/abs/2205.10089v4
- Date: Mon, 4 Mar 2024 19:35:38 GMT
- Title: Kernel Normalized Convolutional Networks
- Authors: Reza Nasirigerdeh, Reihaneh Torkzadehmahani, Daniel Rueckert, Georgios
Kaissis
- Abstract summary: BatchNorm, however, performs poorly with small batch sizes, and is inapplicable to differential privacy.
We propose KernelNorm and kernel normalized convolutional layers, and incorporate them into kernel normalized convolutional networks (KNConvNets)
KNConvNets achieve higher or competitive performance compared to BatchNorm counterparts in image classification and semantic segmentation.
- Score: 15.997774467236352
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Existing convolutional neural network architectures frequently rely upon
batch normalization (BatchNorm) to effectively train the model. BatchNorm,
however, performs poorly with small batch sizes, and is inapplicable to
differential privacy. To address these limitations, we propose the kernel
normalization (KernelNorm) and kernel normalized convolutional layers, and
incorporate them into kernel normalized convolutional networks (KNConvNets) as
the main building blocks. We implement KNConvNets corresponding to the
state-of-the-art ResNets while forgoing the BatchNorm layers. Through extensive
experiments, we illustrate that KNConvNets achieve higher or competitive
performance compared to the BatchNorm counterparts in image classification and
semantic segmentation. They also significantly outperform their
batch-independent competitors including those based on layer and group
normalization in non-private and differentially private training. Given that,
KernelNorm combines the batch-independence property of layer and group
normalization with the performance advantage of BatchNorm.
Related papers
- Compacting Binary Neural Networks by Sparse Kernel Selection [58.84313343190488]
This paper is motivated by a previously revealed phenomenon that the binary kernels in successful BNNs are nearly power-law distributed.
We develop the Permutation Straight-Through Estimator (PSTE) that is able to not only optimize the selection process end-to-end but also maintain the non-repetitive occupancy of selected codewords.
Experiments verify that our method reduces both the model size and bit-wise computational costs, and achieves accuracy improvements compared with state-of-the-art BNNs under comparable budgets.
arXiv Detail & Related papers (2023-03-25T13:53:02Z) - An Empirical Analysis of the Shift and Scale Parameters in BatchNorm [3.198144010381572]
Batch Normalization (BatchNorm) is a technique that improves the training of deep neural networks.
This paper examines the relative contribution to the success of BatchNorm of the normalization step.
arXiv Detail & Related papers (2023-03-22T12:41:12Z) - GMConv: Modulating Effective Receptive Fields for Convolutional Kernels [52.50351140755224]
In convolutional neural networks, the convolutions are performed using a square kernel with a fixed N $times$ N receptive field (RF)
Inspired by the property that ERFs typically exhibit a Gaussian distribution, we propose a Gaussian Mask convolutional kernel (GMConv) in this work.
Our GMConv can directly replace the standard convolutions in existing CNNs and can be easily trained end-to-end by standard back-propagation.
arXiv Detail & Related papers (2023-02-09T10:17:17Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Kernel Normalized Convolutional Networks for Privacy-Preserving Machine
Learning [7.384030323608299]
We compare layer normalization (LayerNorm), group normalization (GroupNorm), and the recently proposed kernel normalization ( KernelNorm) in FL and DP settings.
LayerNorm and GroupNorm provide no performance gain compared to the baseline (i.e. no normalization) for shallow models, but they considerably enhance performance of deeper models.
KernelNorm, on the other hand, significantly outperforms its competitors in terms of accuracy and convergence rate (or communication efficiency) for both shallow and deeper models.
arXiv Detail & Related papers (2022-09-30T19:33:53Z) - Local Sample-weighted Multiple Kernel Clustering with Consensus
Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels.
This paper proposes a novel local sample-weighted multiple kernel clustering model.
Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z) - Properties of the After Kernel [11.4219428942199]
The Neural Tangent Kernel (NTK) is the wide-network limit of a kernel defined using neural networks.
We study the "after kernel", which is defined using the same embedding, except after training, for neural networks with standard architectures.
arXiv Detail & Related papers (2021-05-21T21:50:18Z) - Convolutional Normalization: Improving Deep Convolutional Network
Robustness and Training [44.66478612082257]
Normalization techniques have become a basic component in modern convolutional neural networks (ConvNets)
We introduce a simple and efficient convolutional normalization'' method that can fully exploit the convolutional structure in the Fourier domain.
We show that convolutional normalization can reduce the layerwise spectral norm of the weight matrices and hence improve the Lipschitzness of the network.
arXiv Detail & Related papers (2021-03-01T00:33:04Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Evolving Normalization-Activation Layers [100.82879448303805]
We develop efficient rejection protocols to quickly filter out candidate layers that do not work well.
Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures.
Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets.
arXiv Detail & Related papers (2020-04-06T19:52:48Z) - Separating the Effects of Batch Normalization on CNN Training Speed and
Stability Using Classical Adaptive Filter Theory [40.55789598448379]
Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability.
This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm.
arXiv Detail & Related papers (2020-02-25T05:25:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.