TResNet: High Performance GPU-Dedicated Architecture
- URL: http://arxiv.org/abs/2003.13630v3
- Date: Thu, 27 Aug 2020 05:36:43 GMT
- Title: TResNet: High Performance GPU-Dedicated Architecture
- Authors: Tal Ridnik, Hussam Lawen, Asaf Noy, Emanuel Ben Baruch, Gilad Sharir,
Itamar Friedman
- Abstract summary: Many deep learning models, developed in recent years, reach higher ImageNet accuracy than ResNet50, with fewer or comparable FLOPS count.
In this work, we introduce a series of architecture modifications that aim to boost neural networks' accuracy, while retaining their GPU training and inference efficiency.
We introduce a new family of GPU-dedicated models, called TResNet, which achieve better accuracy and efficiency than previous ConvNets.
- Score: 6.654949459658242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many deep learning models, developed in recent years, reach higher ImageNet
accuracy than ResNet50, with fewer or comparable FLOPS count. While FLOPs are
often seen as a proxy for network efficiency, when measuring actual GPU
training and inference throughput, vanilla ResNet50 is usually significantly
faster than its recent competitors, offering better throughput-accuracy
trade-off.
In this work, we introduce a series of architecture modifications that aim to
boost neural networks' accuracy, while retaining their GPU training and
inference efficiency. We first demonstrate and discuss the bottlenecks induced
by FLOPs-optimizations. We then suggest alternative designs that better utilize
GPU structure and assets. Finally, we introduce a new family of GPU-dedicated
models, called TResNet, which achieve better accuracy and efficiency than
previous ConvNets.
Using a TResNet model, with similar GPU throughput to ResNet50, we reach 80.8
top-1 accuracy on ImageNet. Our TResNet models also transfer well and achieve
state-of-the-art accuracy on competitive single-label classification datasets
such as Stanford cars (96.0%), CIFAR-10 (99.0%), CIFAR-100 (91.5%) and
Oxford-Flowers (99.1%). They also perform well on multi-label classification
and object detection tasks. Implementation is available at:
https://github.com/mrT23/TResNet.
Related papers
- DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation [8.240211805240023]
We revisit the design of atrous convolutions in modern convolutional neural networks (CNNs)
We propose DSNet, a Dual-Branch CNN architecture, which incorporates atrous convolutions in the shallow layers of the model architecture.
Our models achieve a new state-of-the-art trade-off between accuracy and speed on ADE20K, Cityscapes and BDD datasets.
arXiv Detail & Related papers (2024-06-06T02:51:57Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - Greedy Network Enlarging [53.319011626986004]
We propose a greedy network enlarging method based on the reallocation of computations.
With step-by-step modifying the computations on different stages, the enlarged network will be equipped with optimal allocation and utilization of MACs.
With application of our method on GhostNet, we achieve state-of-the-art 80.9% and 84.3% ImageNet top-1 accuracies.
arXiv Detail & Related papers (2021-07-31T08:36:30Z) - Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images.
When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z) - EfficientNetV2: Smaller Models and Faster Training [91.77432224225221]
This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models.
We use a combination of training-aware neural architecture search and scaling, to jointly optimize training speed and parameter efficiency.
Our experiments show that EfficientNetV2 models train much faster than state-of-the-art models while being up to 6.8x smaller.
arXiv Detail & Related papers (2021-04-01T07:08:36Z) - High-Performance Large-Scale Image Recognition Without Normalization [34.58818094675353]
Batch normalization is a key component of most image classification models, but it has many undesirable properties.
We develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets.
Our models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training.
arXiv Detail & Related papers (2021-02-11T18:23:20Z) - RepVGG: Making VGG-style ConvNets Great Again [116.0327370719692]
We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU.
RepVGG reaches over 80% top-1 accuracy, which is the first time for a plain model, to the best of our knowledge.
arXiv Detail & Related papers (2021-01-11T04:46:11Z) - Neural Architecture Design for GPU-Efficient Networks [27.07089149328155]
We propose a general principle for designing GPU-efficient networks based on extensive empirical studies.
Based on the proposed framework, we design a family of GPU-Efficient Networks, or GENets in short.
While achieving $geq 81.3%$ top-1 accuracy on ImageNet, GENet is up to $6.4$ times faster than EfficienNet on GPU.
arXiv Detail & Related papers (2020-06-24T22:42:18Z) - Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture.
We show consistent improvements in accuracy and learning convergence over the baseline.
Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.