Compounding the Performance Improvements of Assembled Techniques in a
Convolutional Neural Network
- URL: http://arxiv.org/abs/2001.06268v2
- Date: Fri, 13 Mar 2020 10:27:45 GMT
- Title: Compounding the Performance Improvements of Assembled Techniques in a
Convolutional Neural Network
- Authors: Jungkyu Lee, Taeryun Won, Tae Kwan Lee, Hyemin Lee, Geonmo Gu, Kiho
Hong
- Abstract summary: We show how to improve the accuracy and robustness of basic CNN models.
Our proposed assembled ResNet-50 shows improvements in top-1 accuracy from 76.3% to 82.78%, mCE from 76.0% to 48.9% and mFR from 57.7% to 32.3%.
Our approach achieved 1st place in the iFood Competition Fine-Grained Visual Recognition at CVPR 2019.
- Score: 6.938261599173859
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies in image classification have demonstrated a variety of
techniques for improving the performance of Convolutional Neural Networks
(CNNs). However, attempts to combine existing techniques to create a practical
model are still uncommon. In this study, we carry out extensive experiments to
validate that carefully assembling these techniques and applying them to basic
CNN models (e.g. ResNet and MobileNet) can improve the accuracy and robustness
of the models while minimizing the loss of throughput. Our proposed assembled
ResNet-50 shows improvements in top-1 accuracy from 76.3\% to 82.78\%, mCE from
76.0\% to 48.9\% and mFR from 57.7\% to 32.3\% on ILSVRC2012 validation set.
With these improvements, inference throughput only decreases from 536 to 312.
To verify the performance improvement in transfer learning, fine grained
classification and image retrieval tasks were tested on several public datasets
and showed that the improvement to backbone network performance boosted
transfer learning performance significantly. Our approach achieved 1st place in
the iFood Competition Fine-Grained Visual Recognition at CVPR 2019, and the
source code and trained models are available at
https://github.com/clovaai/assembled-cnn
Related papers
- Image edge enhancement for effective image classification [7.470763273994321]
We propose an edge enhancement-based method to enhance both accuracy and training speed of neural networks.
Our approach involves extracting high frequency features, such as edges, from images within the available dataset and fusing them with the original images.
arXiv Detail & Related papers (2024-01-13T10:01:34Z) - ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders [104.05133094625137]
We propose a fully convolutional masked autoencoder framework and a new Global Response Normalization layer.
This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets.
arXiv Detail & Related papers (2023-01-02T18:59:31Z) - Establishing a stronger baseline for lightweight contrastive models [10.63129923292905]
Recent research has reported a performance degradation in self-supervised contrastive learning for specially designed efficient networks.
A common practice is to introduce a pretrained contrastive teacher model and train the lightweight networks with distillation signals generated by the teacher.
In this work, we aim to establish a stronger baseline for lightweight contrastive models without using a pretrained teacher model.
arXiv Detail & Related papers (2022-12-14T11:20:24Z) - Network Augmentation for Tiny Deep Learning [73.57192520534585]
We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks.
We demonstrate the effectiveness of NetAug on image classification and object detection.
arXiv Detail & Related papers (2021-10-17T18:48:41Z) - VOLO: Vision Outlooker for Visual Recognition [148.12522298731807]
Vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification.
We introduce a novel outlook attention and present a simple and general architecture, termed Vision Outlooker (VOLO)
Unlike self-attention that focuses on global dependency modeling at a coarse level, the outlook attention efficiently encodes finer-level features and contexts into tokens.
Experiments show that our VOLO achieves 87.1% top-1 accuracy on ImageNet-1K classification, which is the first model exceeding 87% accuracy on this competitive benchmark.
arXiv Detail & Related papers (2021-06-24T15:46:54Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Beyond Self-Supervision: A Simple Yet Effective Network Distillation
Alternative to Improve Backbones [40.33419553042038]
We propose to improve existing baseline networks via knowledge distillation from off-the-shelf pre-trained big powerful models.
Our solution performs distillation by only driving prediction of the student model consistent with that of the teacher model.
We empirically find that such simple distillation settings perform extremely effective, for example, the top-1 accuracy on ImageNet-1k validation set of MobileNetV3-large and ResNet50-D can be significantly improved.
arXiv Detail & Related papers (2021-03-10T09:32:44Z) - An Efficient Quantitative Approach for Optimizing Convolutional Neural
Networks [16.072287925319806]
We propose 3D-Receptive Field (3DRF) to estimate the quality of a CNN architecture and guide the search process of designs.
Our models can achieve up to 5.47% accuracy improvement and up to 65.38% parameters, compared with state-of-the-art CNN structures like MobileNet and ResNet.
arXiv Detail & Related papers (2020-09-11T05:14:34Z) - RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and
Solutions [73.45995446500312]
We analyze the generalization properties of streaming and non-streaming recurrent neural network transducer (RNN-T) based end-to-end models.
We propose two solutions: combining multiple regularization techniques during training, and using dynamic overlapping inference.
arXiv Detail & Related papers (2020-05-07T06:24:47Z) - Fixing the train-test resolution discrepancy: FixEfficientNet [98.64315617109344]
This paper provides an analysis of the performance of the EfficientNet image classifiers with several recent training procedures.
The resulting network, called FixEfficientNet, significantly outperforms the initial architecture with the same number of parameters.
arXiv Detail & Related papers (2020-03-18T14:22:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.