Layer Folding: Neural Network Depth Reduction using Activation
Linearization
- URL: http://arxiv.org/abs/2106.09309v1
- Date: Thu, 17 Jun 2021 08:22:46 GMT
- Title: Layer Folding: Neural Network Depth Reduction using Activation
Linearization
- Authors: Amir Ben Dror, Niv Zehngut, Avraham Raviv, Evgeny Artyomov, Ran Vitek
and Roy Jevnisek
- Abstract summary: Modern devices exhibit a high level of parallelism, but real-time latency is still highly dependent on networks' depth.
We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one.
We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the increasing prevalence of deep neural networks, their
applicability in resource-constrained devices is limited due to their
computational load. While modern devices exhibit a high level of parallelism,
real-time latency is still highly dependent on networks' depth. Although recent
works show that below a certain depth, the width of shallower networks must
grow exponentially, we presume that neural networks typically exceed this
minimal depth to accelerate convergence and incrementally increase accuracy.
This motivates us to transform pre-trained deep networks that already exploit
such advantages into shallower forms. We propose a method that learns whether
non-linear activations can be removed, allowing to fold consecutive linear
layers into one. We apply our method to networks pre-trained on CIFAR-10 and
CIFAR-100 and find that they can all be transformed into shallower forms that
share a similar depth. Finally, we use our method to provide more efficient
alternatives to MobileNetV2 and EfficientNet-Lite architectures on the ImageNet
classification task.
Related papers
- NEPENTHE: Entropy-Based Pruning as a Neural Network Depth's Reducer [5.373015313199385]
We propose an eNtropy-basEd Pruning as a nEural Network depTH's rEducer to alleviate deep neural networks' computational burden.
We validate our approach on popular architectures such as MobileNet and Swin-T.
arXiv Detail & Related papers (2024-04-24T09:12:04Z) - Optimizing Performance of Feedforward and Convolutional Neural Networks
through Dynamic Activation Functions [0.46040036610482665]
Deep learning training algorithms are a huge success in recent years in many fields including speech, text,image video etc.
Deep and deeper layers are proposed with huge success with resnet structures having around 152 layers.
Shallow convolution neural networks(CNN's) are still an active research, where some phenomena are still unexplained.
arXiv Detail & Related papers (2023-08-10T17:39:51Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Deep Learning without Shortcuts: Shaping the Kernel with Tailored
Rectifiers [83.74380713308605]
We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs.
We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
arXiv Detail & Related papers (2022-03-15T17:49:08Z) - Channel Planting for Deep Neural Networks using Knowledge Distillation [3.0165431987188245]
We present a novel incremental training algorithm for deep neural networks called planting.
Our planting can search the optimal network architecture with smaller number of parameters for improving the network performance.
We evaluate the effectiveness of the proposed method on different datasets such as CIFAR-10/100 and STL-10.
arXiv Detail & Related papers (2020-11-04T16:29:59Z) - HALO: Learning to Prune Neural Networks with Shrinkage [5.283963846188862]
Deep neural networks achieve state-of-the-art performance in a variety of tasks by extracting a rich set of features from unstructured data.
Modern techniques for inducing sparsity and reducing model size are (1) network pruning, (2) training with a sparsity inducing penalty, and (3) training a binary mask jointly with the weights of the network.
We present a novel penalty called Hierarchical Adaptive Lasso which learns to adaptively sparsify weights of a given network via trainable parameters.
arXiv Detail & Related papers (2020-08-24T04:08:48Z) - Go Wide, Then Narrow: Efficient Training of Deep Thin Networks [62.26044348366186]
We propose an efficient method to train a deep thin network with a theoretic guarantee.
By training with our method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with BERT Large.
arXiv Detail & Related papers (2020-07-01T23:34:35Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.