Evolving Normalization-Activation Layers
- URL: http://arxiv.org/abs/2004.02967v5
- Date: Fri, 17 Jul 2020 04:42:59 GMT
- Title: Evolving Normalization-Activation Layers
- Authors: Hanxiao Liu, Andrew Brock, Karen Simonyan, Quoc V. Le
- Abstract summary: We develop efficient rejection protocols to quickly filter out candidate layers that do not work well.
Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures.
Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets.
- Score: 100.82879448303805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Normalization layers and activation functions are fundamental components in
deep networks and typically co-locate with each other. Here we propose to
design them using an automated approach. Instead of designing them separately,
we unify them into a single tensor-to-tensor computation graph, and evolve its
structure starting from basic mathematical functions. Examples of such
mathematical functions are addition, multiplication and statistical moments.
The use of low-level mathematical functions, in contrast to the use of
high-level modules in mainstream NAS, leads to a highly sparse and large search
space which can be challenging for search methods. To address the challenge, we
develop efficient rejection protocols to quickly filter out candidate layers
that do not work well. We also use multi-objective evolution to optimize each
layer's performance across many architectures to prevent overfitting. Our
method leads to the discovery of EvoNorms, a set of new
normalization-activation layers with novel, and sometimes surprising structures
that go beyond existing design patterns. For example, some EvoNorms do not
assume that normalization and activation functions must be applied
sequentially, nor need to center the feature maps, nor require explicit
activation functions. Our experiments show that EvoNorms work well on image
classification models including ResNets, MobileNets and EfficientNets but also
transfer well to Mask R-CNN with FPN/SpineNet for instance segmentation and to
BigGAN for image synthesis, outperforming BatchNorm and GroupNorm based layers
in many cases.
Related papers
- Multilinear Operator Networks [60.7432588386185]
Polynomial Networks is a class of models that does not require activation functions.
We propose MONet, which relies solely on multilinear operators.
arXiv Detail & Related papers (2024-01-31T16:52:19Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - EvoPruneDeepTL: An Evolutionary Pruning Model for Transfer Learning
based Deep Neural Networks [15.29595828816055]
We propose an evolutionary pruning model for Transfer Learning based Deep Neural Networks.
EvoPruneDeepTL replaces the last fully-connected layers with sparse layers optimized by a genetic algorithm.
Results show the contribution of EvoPruneDeepTL and feature selection to the overall computational efficiency of the network.
arXiv Detail & Related papers (2022-02-08T13:07:55Z) - Learning strides in convolutional neural networks [34.20666933112202]
This work introduces DiffStride, the first downsampling layer with learnable strides.
Experiments on audio and image classification show the generality and effectiveness of our solution.
arXiv Detail & Related papers (2022-02-03T16:03:36Z) - Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural
Networks [10.278350434623107]
Quantized neural networks typically require smaller memory footprints and lower computation complexity, which is crucial for efficient deployment.
We present an adaptive-mapping quantization method to learn an optimal latent sub-distribution that is inherent within models.
Experiments on image classification and object detection over various modern architectures demonstrate the effectiveness, generalization property, and transferability of the proposed method.
arXiv Detail & Related papers (2021-12-30T17:28:11Z) - Rapid training of deep neural networks without skip connections or
normalization layers using Deep Kernel Shaping [46.083745557823164]
We identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data.
We show how these can be avoided by carefully controlling the "shape" of the network's kernel function.
arXiv Detail & Related papers (2021-10-05T00:49:36Z) - Smoother Network Tuning and Interpolation for Continuous-level Image
Processing [7.730087303035803]
Filter Transition Network (FTN) is a structurally smoother module for continuous-level learning.
FTN generalizes well across various tasks and networks and cause fewer undesirable side effects.
For stable learning of FTN, we additionally propose a method to non-linear neural network layers with identity mappings.
arXiv Detail & Related papers (2020-10-05T18:29:52Z) - Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet.
Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs)
Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.