High-Performance Large-Scale Image Recognition Without Normalization
- URL: http://arxiv.org/abs/2102.06171v1
- Date: Thu, 11 Feb 2021 18:23:20 GMT
- Title: High-Performance Large-Scale Image Recognition Without Normalization
- Authors: Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan
- Abstract summary: Batch normalization is a key component of most image classification models, but it has many undesirable properties.
We develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets.
Our models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training.
- Score: 34.58818094675353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch normalization is a key component of most image classification models,
but it has many undesirable properties stemming from its dependence on the
batch size and interactions between examples. Although recent work has
succeeded in training deep ResNets without normalization layers, these models
do not match the test accuracies of the best batch-normalized networks, and are
often unstable for large learning rates or strong data augmentations. In this
work, we develop an adaptive gradient clipping technique which overcomes these
instabilities, and design a significantly improved class of Normalizer-Free
ResNets. Our smaller models match the test accuracy of an EfficientNet-B7 on
ImageNet while being up to 8.7x faster to train, and our largest models attain
a new state-of-the-art top-1 accuracy of 86.5%. In addition, Normalizer-Free
models attain significantly better performance than their batch-normalized
counterparts when finetuning on ImageNet after large-scale pre-training on a
dataset of 300 million labeled images, with our best models obtaining an
accuracy of 89.2%. Our code is available at https://github.com/deepmind/
deepmind-research/tree/master/nfnets
Related papers
- Improving Zero-shot Generalization and Robustness of Multi-modal Models [70.14692320804178]
Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks.
We investigate the reasons for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts.
We propose a simple and efficient way to improve accuracy on such uncertain images by making use of the WordNet hierarchy.
arXiv Detail & Related papers (2022-12-04T07:26:24Z) - CoV-TI-Net: Transferred Initialization with Modified End Layer for
COVID-19 Diagnosis [5.546855806629448]
Transfer learning is a relatively new learning method that has been employed in many sectors to achieve good performance with fewer computations.
In this research, the PyTorch pre-trained models (VGG19_bn and WideResNet -101) are applied in the MNIST dataset.
The proposed model is developed and verified in the Kaggle notebook, and it reached the outstanding accuracy of 99.77% without taking a huge computational time.
arXiv Detail & Related papers (2022-09-20T08:52:52Z) - Core Risk Minimization using Salient ImageNet [53.616101711801484]
We introduce the Salient Imagenet dataset with more than 1 million soft masks localizing core and spurious features for all 1000 Imagenet classes.
Using this dataset, we first evaluate the reliance of several Imagenet pretrained models (42 total) on spurious features.
Next, we introduce a new learning paradigm called Core Risk Minimization (CoRM) whose objective ensures that the model predicts a class using its core features.
arXiv Detail & Related papers (2022-03-28T01:53:34Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - EfficientNetV2: Smaller Models and Faster Training [91.77432224225221]
This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models.
We use a combination of training-aware neural architecture search and scaling, to jointly optimize training speed and parameter efficiency.
Our experiments show that EfficientNetV2 models train much faster than state-of-the-art models while being up to 6.8x smaller.
arXiv Detail & Related papers (2021-04-01T07:08:36Z) - Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in
Image Classification [46.885260723836865]
Deep convolutional neural networks (CNNs) generally improve when fueled with high resolution images.
Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efficient image classification.
Our framework is general and flexible as it is compatible with most of the state-of-the-art light-weighted CNNs.
arXiv Detail & Related papers (2020-10-11T17:55:06Z) - Rethinking CNN Models for Audio Classification [20.182928938110923]
ImageNet-Pretrained standard deep CNN models can be used as strong baseline networks for audio classification.
We systematically study how much of pretrained weights is useful for learning spectrograms.
We show that for a given standard model using pretrained weights is better than using randomly Dense weights.
arXiv Detail & Related papers (2020-07-22T01:31:44Z) - Do Adversarially Robust ImageNet Models Transfer Better? [102.09335596483695]
adversarially robust models often perform better than their standard-trained counterparts when used for transfer learning.
Our results are consistent with (and in fact, add to) recent hypotheses stating that robustness leads to improved feature representations.
arXiv Detail & Related papers (2020-07-16T17:42:40Z) - TResNet: High Performance GPU-Dedicated Architecture [6.654949459658242]
Many deep learning models, developed in recent years, reach higher ImageNet accuracy than ResNet50, with fewer or comparable FLOPS count.
In this work, we introduce a series of architecture modifications that aim to boost neural networks' accuracy, while retaining their GPU training and inference efficiency.
We introduce a new family of GPU-dedicated models, called TResNet, which achieve better accuracy and efficiency than previous ConvNets.
arXiv Detail & Related papers (2020-03-30T17:04:47Z) - Fixing the train-test resolution discrepancy: FixEfficientNet [98.64315617109344]
This paper provides an analysis of the performance of the EfficientNet image classifiers with several recent training procedures.
The resulting network, called FixEfficientNet, significantly outperforms the initial architecture with the same number of parameters.
arXiv Detail & Related papers (2020-03-18T14:22:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.