Deep Learning without Shortcuts: Shaping the Kernel with Tailored
Rectifiers
- URL: http://arxiv.org/abs/2203.08120v1
- Date: Tue, 15 Mar 2022 17:49:08 GMT
- Title: Deep Learning without Shortcuts: Shaping the Kernel with Tailored
Rectifiers
- Authors: Guodong Zhang, Aleksandar Botev, James Martens
- Abstract summary: We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs.
We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
- Score: 83.74380713308605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training very deep neural networks is still an extremely challenging task.
The common solution is to use shortcut connections and normalization layers,
which are both crucial ingredients in the popular ResNet architecture. However,
there is strong evidence to suggest that ResNets behave more like ensembles of
shallower networks than truly deep ones. Recently, it was shown that deep
vanilla networks (i.e. networks without normalization layers or shortcut
connections) can be trained as fast as ResNets by applying certain
transformations to their activation functions. However, this method (called
Deep Kernel Shaping) isn't fully compatible with ReLUs, and produces networks
that overfit significantly more than ResNets on ImageNet. In this work, we
rectify this situation by developing a new type of transformation that is fully
compatible with a variant of ReLUs -- Leaky ReLUs. We show in experiments that
our method, which introduces negligible extra computational cost, achieves
validation accuracies with deep vanilla networks that are competitive with
ResNets (of the same width/depth), and significantly higher than those obtained
with the Edge of Chaos (EOC) method. And unlike with EOC, the validation
accuracies we obtain do not get worse with depth.
Related papers
- Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - RDRN: Recursively Defined Residual Network for Image Super-Resolution [58.64907136562178]
Deep convolutional neural networks (CNNs) have obtained remarkable performance in single image super-resolution.
We propose a novel network architecture which utilizes attention blocks efficiently.
arXiv Detail & Related papers (2022-11-17T11:06:29Z) - Rapid training of deep neural networks without skip connections or
normalization layers using Deep Kernel Shaping [46.083745557823164]
We identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data.
We show how these can be avoided by carefully controlling the "shape" of the network's kernel function.
arXiv Detail & Related papers (2021-10-05T00:49:36Z) - Layer Folding: Neural Network Depth Reduction using Activation
Linearization [0.0]
Modern devices exhibit a high level of parallelism, but real-time latency is still highly dependent on networks' depth.
We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one.
We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth.
arXiv Detail & Related papers (2021-06-17T08:22:46Z) - ResNet or DenseNet? Introducing Dense Shortcuts to ResNet [80.35001540483789]
This paper presents a unified perspective of dense summation to analyze them.
We propose dense weighted normalized shortcuts as a solution to the dilemma between ResNet and DenseNet.
Our proposed DSNet achieves significantly better results than ResNet, and achieves comparable performance as DenseNet but requiring fewer resources.
arXiv Detail & Related papers (2020-10-23T16:00:15Z) - OverNet: Lightweight Multi-Scale Super-Resolution with Overscaling
Network [3.6683231417848283]
We introduce OverNet, a deep but lightweight convolutional network to solve SISR at arbitrary scale factors with a single model.
We show that our network outperforms previous state-of-the-art results in standard benchmarks while using fewer parameters than previous approaches.
arXiv Detail & Related papers (2020-08-05T22:10:29Z) - Go Wide, Then Narrow: Efficient Training of Deep Thin Networks [62.26044348366186]
We propose an efficient method to train a deep thin network with a theoretic guarantee.
By training with our method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with BERT Large.
arXiv Detail & Related papers (2020-07-01T23:34:35Z) - Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture.
We show consistent improvements in accuracy and learning convergence over the baseline.
Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z) - Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection [35.121856435677564]
We propose a simple greedy selection approach for finding goodworks in deep neural networks.
Applying the greedy selection strategy on sufficiently large pre-trained networks guarantees to find smallworks with lower loss than networks directly trained with gradient descent.
arXiv Detail & Related papers (2020-03-03T21:03:11Z) - Knapsack Pruning with Inner Distillation [11.04321604965426]
We propose a novel pruning method that optimize the final accuracy of the pruned network.
We prune the network channels while maintaining the high-level structure of the network.
Our method leads to state-of-the-art pruning results on ImageNet, CIFAR-10 and CIFAR-100 using ResNet backbones.
arXiv Detail & Related papers (2020-02-19T16:04:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.