Related papers: Evolving Normalization-Activation Layers

Evolving Normalization-Activation Layers

URL: http://arxiv.org/abs/2004.02967v5
Date: Fri, 17 Jul 2020 04:42:59 GMT
Title: Evolving Normalization-Activation Layers
Authors: Hanxiao Liu, Andrew Brock, Karen Simonyan, Quoc V. Le
Abstract summary: We develop efficient rejection protocols to quickly filter out candidate layers that do not work well. Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures. Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets.
Score: 100.82879448303805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical functions are addition, multiplication and statistical moments. The use of low-level mathematical functions, in contrast to the use of high-level modules in mainstream NAS, leads to a highly sparse and large search space which can be challenging for search methods. To address the challenge, we develop efficient rejection protocols to quickly filter out candidate layers that do not work well. We also use multi-objective evolution to optimize each layer's performance across many architectures to prevent overfitting. Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures that go beyond existing design patterns. For example, some EvoNorms do not assume that normalization and activation functions must be applied sequentially, nor need to center the feature maps, nor require explicit activation functions. Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets but also transfer well to Mask R-CNN with FPN/SpineNet for instance segmentation and to BigGAN for image synthesis, outperforming BatchNorm and GroupNorm based layers in many cases.

Related papers

Multilinear Operator Networks [60.7432588386185]
Polynomial Networks is a class of models that does not require activation functions. We propose MONet, which relies solely on multilinear operators.
arXiv Detail & Related papers (2024-01-31T16:52:19Z)
Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures'' Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z)
EvoPruneDeepTL: An Evolutionary Pruning Model for Transfer Learning based Deep Neural Networks [15.29595828816055]
We propose an evolutionary pruning model for Transfer Learning based Deep Neural Networks. EvoPruneDeepTL replaces the last fully-connected layers with sparse layers optimized by a genetic algorithm. Results show the contribution of EvoPruneDeepTL and feature selection to the overall computational efficiency of the network.
arXiv Detail & Related papers (2022-02-08T13:07:55Z)
Learning strides in convolutional neural networks [34.20666933112202]
This work introduces DiffStride, the first downsampling layer with learnable strides. Experiments on audio and image classification show the generality and effectiveness of our solution.
arXiv Detail & Related papers (2022-02-03T16:03:36Z)
Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks [10.278350434623107]
Quantized neural networks typically require smaller memory footprints and lower computation complexity, which is crucial for efficient deployment. We present an adaptive-mapping quantization method to learn an optimal latent sub-distribution that is inherent within models. Experiments on image classification and object detection over various modern architectures demonstrate the effectiveness, generalization property, and transferability of the proposed method.
arXiv Detail & Related papers (2021-12-30T17:28:11Z)
Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping [46.083745557823164]
We identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data. We show how these can be avoided by carefully controlling the "shape" of the network's kernel function.
arXiv Detail & Related papers (2021-10-05T00:49:36Z)
Smoother Network Tuning and Interpolation for Continuous-level Image Processing [7.730087303035803]
Filter Transition Network (FTN) is a structurally smoother module for continuous-level learning. FTN generalizes well across various tasks and networks and cause fewer undesirable side effects. For stable learning of FTN, we additionally propose a method to non-linear neural network layers with identity mappings.
arXiv Detail & Related papers (2020-10-05T18:29:52Z)
Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet. Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs) Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.