Related papers: Layer Folding: Neural Network Depth Reduction using Activation Linearization

Layer Folding: Neural Network Depth Reduction using Activation Linearization

URL: http://arxiv.org/abs/2106.09309v1
Date: Thu, 17 Jun 2021 08:22:46 GMT
Title: Layer Folding: Neural Network Depth Reduction using Activation Linearization
Authors: Amir Ben Dror, Niv Zehngut, Avraham Raviv, Evgeny Artyomov, Ran Vitek and Roy Jevnisek
Abstract summary: Modern devices exhibit a high level of parallelism, but real-time latency is still highly dependent on networks' depth. We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one. We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the increasing prevalence of deep neural networks, their applicability in resource-constrained devices is limited due to their computational load. While modern devices exhibit a high level of parallelism, real-time latency is still highly dependent on networks' depth. Although recent works show that below a certain depth, the width of shallower networks must grow exponentially, we presume that neural networks typically exceed this minimal depth to accelerate convergence and incrementally increase accuracy. This motivates us to transform pre-trained deep networks that already exploit such advantages into shallower forms. We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one. We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth. Finally, we use our method to provide more efficient alternatives to MobileNetV2 and EfficientNet-Lite architectures on the ImageNet classification task.

Related papers

Till the Layers Collapse: Compressing a Deep Neural Network through the Lenses of Batch Normalization Layers [5.008189006630566]
We introduce a method called textbfTill the textbfLayers textbfCollapse (TLC), which compresses deep neural networks through the lenses of batch normalization layers. We validate our method on popular models such as Swin-T, MobileNet-V2, and RoBERTa, across both image classification and natural language processing (NLP) tasks.
arXiv Detail & Related papers (2024-12-19T17:26:07Z)
NEPENTHE: Entropy-Based Pruning as a Neural Network Depth's Reducer [5.373015313199385]
We propose an eNtropy-basEd Pruning as a nEural Network depTH's rEducer to alleviate deep neural networks' computational burden. We validate our approach on popular architectures such as MobileNet and Swin-T.
arXiv Detail & Related papers (2024-04-24T09:12:04Z)
Optimizing Performance of Feedforward and Convolutional Neural Networks through Dynamic Activation Functions [0.46040036610482665]
Deep learning training algorithms are a huge success in recent years in many fields including speech, text,image video etc. Deep and deeper layers are proposed with huge success with resnet structures having around 152 layers. Shallow convolution neural networks(CNN's) are still an active research, where some phenomena are still unexplained.
arXiv Detail & Related papers (2023-08-10T17:39:51Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers [83.74380713308605]
We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs. We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
arXiv Detail & Related papers (2022-03-15T17:49:08Z)
Channel Planting for Deep Neural Networks using Knowledge Distillation [3.0165431987188245]
We present a novel incremental training algorithm for deep neural networks called planting. Our planting can search the optimal network architecture with smaller number of parameters for improving the network performance. We evaluate the effectiveness of the proposed method on different datasets such as CIFAR-10/100 and STL-10.
arXiv Detail & Related papers (2020-11-04T16:29:59Z)
HALO: Learning to Prune Neural Networks with Shrinkage [5.283963846188862]
Deep neural networks achieve state-of-the-art performance in a variety of tasks by extracting a rich set of features from unstructured data. Modern techniques for inducing sparsity and reducing model size are (1) network pruning, (2) training with a sparsity inducing penalty, and (3) training a binary mask jointly with the weights of the network. We present a novel penalty called Hierarchical Adaptive Lasso which learns to adaptively sparsify weights of a given network via trainable parameters.
arXiv Detail & Related papers (2020-08-24T04:08:48Z)
Go Wide, Then Narrow: Efficient Training of Deep Thin Networks [62.26044348366186]
We propose an efficient method to train a deep thin network with a theoretic guarantee. By training with our method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with BERT Large.
arXiv Detail & Related papers (2020-07-01T23:34:35Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.