Related papers: Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network

Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network

URL: http://arxiv.org/abs/2001.06268v2
Date: Fri, 13 Mar 2020 10:27:45 GMT
Title: Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network
Authors: Jungkyu Lee, Taeryun Won, Tae Kwan Lee, Hyemin Lee, Geonmo Gu, Kiho Hong
Abstract summary: We show how to improve the accuracy and robustness of basic CNN models. Our proposed assembled ResNet-50 shows improvements in top-1 accuracy from 76.3% to 82.78%, mCE from 76.0% to 48.9% and mFR from 57.7% to 32.3%. Our approach achieved 1st place in the iFood Competition Fine-Grained Visual Recognition at CVPR 2019.
Score: 6.938261599173859
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies in image classification have demonstrated a variety of techniques for improving the performance of Convolutional Neural Networks (CNNs). However, attempts to combine existing techniques to create a practical model are still uncommon. In this study, we carry out extensive experiments to validate that carefully assembling these techniques and applying them to basic CNN models (e.g. ResNet and MobileNet) can improve the accuracy and robustness of the models while minimizing the loss of throughput. Our proposed assembled ResNet-50 shows improvements in top-1 accuracy from 76.3\% to 82.78\%, mCE from 76.0\% to 48.9\% and mFR from 57.7\% to 32.3\% on ILSVRC2012 validation set. With these improvements, inference throughput only decreases from 536 to 312. To verify the performance improvement in transfer learning, fine grained classification and image retrieval tasks were tested on several public datasets and showed that the improvement to backbone network performance boosted transfer learning performance significantly. Our approach achieved 1st place in the iFood Competition Fine-Grained Visual Recognition at CVPR 2019, and the source code and trained models are available at https://github.com/clovaai/assembled-cnn

Related papers

Enhancing Crop Segmentation in Satellite Image Time Series with Transformer Networks [1.339000056057208]
This paper presents a revised version of the Transformer-based Swin UNETR model, specifically adapted for crop segmentation of Satellite Image Time Series (SITS) The proposed model demonstrates significant advancements, achieving a validation accuracy of 96.14% and a test accuracy of 95.26% on the Munich dataset. Experiments of this study indicate that the model will likely achieve comparable or superior accuracy to CNNs while requiring significantly less training time.
arXiv Detail & Related papers (2024-12-02T20:08:22Z)
Self-Supervised Learning in Deep Networks: A Pathway to Robust Few-Shot Classification [0.0]
We first pre-train the model with self-supervision to enable it to learn common feature expressions on a large amount of unlabeled data. Then fine-tune it on the few-shot dataset Mini-ImageNet to improve the model's accuracy and generalization ability under limited data.
arXiv Detail & Related papers (2024-11-19T01:01:56Z)
Image edge enhancement for effective image classification [7.470763273994321]
We propose an edge enhancement-based method to enhance both accuracy and training speed of neural networks. Our approach involves extracting high frequency features, such as edges, from images within the available dataset and fusing them with the original images.
arXiv Detail & Related papers (2024-01-13T10:01:34Z)
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders [104.05133094625137]
We propose a fully convolutional masked autoencoder framework and a new Global Response Normalization layer. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets.
arXiv Detail & Related papers (2023-01-02T18:59:31Z)
Network Augmentation for Tiny Deep Learning [73.57192520534585]
We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks. We demonstrate the effectiveness of NetAug on image classification and object detection.
arXiv Detail & Related papers (2021-10-17T18:48:41Z)
VOLO: Vision Outlooker for Visual Recognition [148.12522298731807]
Vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification. We introduce a novel outlook attention and present a simple and general architecture, termed Vision Outlooker (VOLO) Unlike self-attention that focuses on global dependency modeling at a coarse level, the outlook attention efficiently encodes finer-level features and contexts into tokens. Experiments show that our VOLO achieves 87.1% top-1 accuracy on ImageNet-1K classification, which is the first model exceeding 87% accuracy on this competitive benchmark.
arXiv Detail & Related papers (2021-06-24T15:46:54Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones [40.33419553042038]
We propose to improve existing baseline networks via knowledge distillation from off-the-shelf pre-trained big powerful models. Our solution performs distillation by only driving prediction of the student model consistent with that of the teacher model. We empirically find that such simple distillation settings perform extremely effective, for example, the top-1 accuracy on ImageNet-1k validation set of MobileNetV3-large and ResNet50-D can be significantly improved.
arXiv Detail & Related papers (2021-03-10T09:32:44Z)
An Efficient Quantitative Approach for Optimizing Convolutional Neural Networks [16.072287925319806]
We propose 3D-Receptive Field (3DRF) to estimate the quality of a CNN architecture and guide the search process of designs. Our models can achieve up to 5.47% accuracy improvement and up to 65.38% parameters, compared with state-of-the-art CNN structures like MobileNet and ResNet.
arXiv Detail & Related papers (2020-09-11T05:14:34Z)
RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions [73.45995446500312]
We analyze the generalization properties of streaming and non-streaming recurrent neural network transducer (RNN-T) based end-to-end models. We propose two solutions: combining multiple regularization techniques during training, and using dynamic overlapping inference.
arXiv Detail & Related papers (2020-05-07T06:24:47Z)
Fixing the train-test resolution discrepancy: FixEfficientNet [98.64315617109344]
This paper provides an analysis of the performance of the EfficientNet image classifiers with several recent training procedures. The resulting network, called FixEfficientNet, significantly outperforms the initial architecture with the same number of parameters.
arXiv Detail & Related papers (2020-03-18T14:22:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.