Revisiting ResNets: Improved Training and Scaling Strategies
- URL: http://arxiv.org/abs/2103.07579v1
- Date: Sat, 13 Mar 2021 00:18:19 GMT
- Title: Revisiting ResNets: Improved Training and Scaling Strategies
- Authors: Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind
Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph
- Abstract summary: Training and scaling strategies may matter more than architectural changes, and the resulting ResNets match recent state-of-the-art models.
We show that the best performing scaling strategy depends on the training regime.
We design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.
- Score: 54.0162571976267
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Novel computer vision architectures monopolize the spotlight, but the impact
of the model architecture is often conflated with simultaneous changes to
training methodology and scaling strategies. Our work revisits the canonical
ResNet (He et al., 2015) and studies these three aspects in an effort to
disentangle them. Perhaps surprisingly, we find that training and scaling
strategies may matter more than architectural changes, and further, that the
resulting ResNets match recent state-of-the-art models. We show that the best
performing scaling strategy depends on the training regime and offer two new
scaling strategies: (1) scale model depth in regimes where overfitting can
occur (width scaling is preferable otherwise); (2) increase image resolution
more slowly than previously recommended (Tan & Le, 2019). Using improved
training and scaling strategies, we design a family of ResNet architectures,
ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while
achieving similar accuracies on ImageNet. In a large-scale semi-supervised
learning setup, ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being
4.7x faster than EfficientNet NoisyStudent. The training techniques improve
transfer performance on a suite of downstream tasks (rivaling state-of-the-art
self-supervised algorithms) and extend to video classification on Kinetics-400.
We recommend practitioners use these simple revised ResNets as baselines for
future research.
Related papers
- DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs [30.412909498409192]
This paper revives Densely Connected Convolutional Networks (DenseNets)
We believe DenseNets' potential was overlooked due to untouched training methods and traditional design elements not fully revealing their capabilities.
We provide empirical analyses that uncover the merits of the concatenation over additive shortcuts, steering a renewed preference towards DenseNet-style designs.
arXiv Detail & Related papers (2024-03-28T17:12:39Z) - ScaleNet: An Unsupervised Representation Learning Method for Limited
Information [0.0]
A simple and efficient unsupervised representation learning method named ScaleNet is proposed.
Specific image features, such as Harris corner information, play a critical role in the efficiency of the rotation-prediction task.
The transferred parameters from a ScaleNet model with limited data improve the ImageNet Classification task by about 6% compared to the RotNet model.
arXiv Detail & Related papers (2023-10-03T19:13:43Z) - ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders [104.05133094625137]
We propose a fully convolutional masked autoencoder framework and a new Global Response Normalization layer.
This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets.
arXiv Detail & Related papers (2023-01-02T18:59:31Z) - Pushing the limits of self-supervised ResNets: Can we outperform
supervised learning without labels on ImageNet? [35.98841834512082]
ReLICv2 is first representation learning method to consistently outperform the supervised baseline in a like-for-like comparison.
We show that despite using ResNet encoders, ReLICv2 is comparable to state-of-the-art self-supervised vision transformers.
arXiv Detail & Related papers (2022-01-13T18:23:30Z) - DAFormer: Improving Network Architectures and Training Strategies for
Domain-Adaptive Semantic Segmentation [99.88539409432916]
We study the unsupervised domain adaptation (UDA) process.
We propose a novel UDA method, DAFormer, based on the benchmark results.
DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes.
arXiv Detail & Related papers (2021-11-29T19:00:46Z) - Revisiting 3D ResNets for Video Recognition [18.91688307058961]
This note studies effective training and scaling strategies for video recognition models.
We propose a simple scaling strategy for 3D ResNets, in combination with improved training strategies and minor architectural changes.
arXiv Detail & Related papers (2021-09-03T18:27:52Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z) - EfficientNetV2: Smaller Models and Faster Training [91.77432224225221]
This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models.
We use a combination of training-aware neural architecture search and scaling, to jointly optimize training speed and parameter efficiency.
Our experiments show that EfficientNetV2 models train much faster than state-of-the-art models while being up to 6.8x smaller.
arXiv Detail & Related papers (2021-04-01T07:08:36Z) - Learning to Resize Images for Computer Vision Tasks [15.381549764216134]
We show that the typical linear resizer can be replaced with learned resizers that can substantially improve performance.
Our learned image resizer is jointly trained with a baseline vision model.
We show that the proposed resizer can also be useful for fine-tuning the classification baselines for other vision tasks.
arXiv Detail & Related papers (2021-03-17T23:43:44Z) - Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture.
We show consistent improvements in accuracy and learning convergence over the baseline.
Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.