Non-deep Networks
- URL: http://arxiv.org/abs/2110.07641v1
- Date: Thu, 14 Oct 2021 18:03:56 GMT
- Title: Non-deep Networks
- Authors: Ankit Goyal, Alexey Bochkovskiy, Jia Deng, Vladlen Koltun
- Abstract summary: We show that it is possible to build high-performing "non-deep" neural networks.
By utilizing parallel substructures, we show that a network with a depth of just 12 can achieve top-1 accuracy over 80%.
We provide a proof of concept for how non-deep networks could be used to build low-latency recognition systems.
- Score: 122.77755088736865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth is the hallmark of deep neural networks. But more depth means more
sequential computation and higher latency. This begs the question -- is it
possible to build high-performing "non-deep" neural networks? We show that it
is. To do so, we use parallel subnetworks instead of stacking one layer after
another. This helps effectively reduce depth while maintaining high
performance. By utilizing parallel substructures, we show, for the first time,
that a network with a depth of just 12 can achieve top-1 accuracy over 80% on
ImageNet, 96% on CIFAR10, and 81% on CIFAR100. We also show that a network with
a low-depth (12) backbone can achieve an AP of 48% on MS-COCO. We analyze the
scaling rules for our design and show how to increase performance without
changing the network's depth. Finally, we provide a proof of concept for how
non-deep networks could be used to build low-latency recognition systems. Code
is available at https://github.com/imankgoyal/NonDeepNetworks.
Related papers
- ResNet: Enabling Deep Convolutional Neural Networks through Residual Learning [4.949171031381768]
ResNet enables the training of networks with hundreds of layers by allowing gradients to flow directly through shortcut connections.<n>In our implementation on the CIFAR-10 dataset, ResNet-18 achieves 89.9% accuracy compared to 84.1% for a traditional deep CNN of similar depth.
arXiv Detail & Related papers (2025-10-28T03:36:15Z) - Layer Folding: Neural Network Depth Reduction using Activation
Linearization [0.0]
Modern devices exhibit a high level of parallelism, but real-time latency is still highly dependent on networks' depth.
We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one.
We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth.
arXiv Detail & Related papers (2021-06-17T08:22:46Z) - Channel Planting for Deep Neural Networks using Knowledge Distillation [3.0165431987188245]
We present a novel incremental training algorithm for deep neural networks called planting.
Our planting can search the optimal network architecture with smaller number of parameters for improving the network performance.
We evaluate the effectiveness of the proposed method on different datasets such as CIFAR-10/100 and STL-10.
arXiv Detail & Related papers (2020-11-04T16:29:59Z) - Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets [65.28292822614418]
Giant formula for simultaneously enlarging the resolution, depth and width provides us a Rubik's cube for neural networks.
This paper aims to explore the twisting rules for obtaining deep neural networks with minimum model sizes and computational costs.
arXiv Detail & Related papers (2020-10-28T08:49:45Z) - Hierarchical Neural Architecture Search for Deep Stereo Matching [131.94481111956853]
We propose the first end-to-end hierarchical NAS framework for deep stereo matching.
Our framework incorporates task-specific human knowledge into the neural architecture search framework.
It is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset.
arXiv Detail & Related papers (2020-10-26T11:57:37Z) - Go Wide, Then Narrow: Efficient Training of Deep Thin Networks [62.26044348366186]
We propose an efficient method to train a deep thin network with a theoretic guarantee.
By training with our method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with BERT Large.
arXiv Detail & Related papers (2020-07-01T23:34:35Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture.
We show consistent improvements in accuracy and learning convergence over the baseline.
Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z) - Knapsack Pruning with Inner Distillation [11.04321604965426]
We propose a novel pruning method that optimize the final accuracy of the pruned network.
We prune the network channels while maintaining the high-level structure of the network.
Our method leads to state-of-the-art pruning results on ImageNet, CIFAR-10 and CIFAR-100 using ResNet backbones.
arXiv Detail & Related papers (2020-02-19T16:04:48Z) - Fast Neural Network Adaptation via Parameter Remapping and Architecture
Search [35.61441231491448]
Deep neural networks achieve remarkable performance in many computer vision tasks.
Most state-of-the-art (SOTA) semantic segmentation and object detection approaches reuse neural network architectures designed for image classification as the backbone.
One major challenge though, is that ImageNet pre-training of the search space representation incurs huge computational cost.
In this paper, we propose a Fast Neural Network Adaptation (FNA) method, which can adapt both the architecture and parameters of a seed network.
arXiv Detail & Related papers (2020-01-08T13:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.