Resolution Switchable Networks for Runtime Efficient Image Recognition
- URL: http://arxiv.org/abs/2007.09558v3
- Date: Mon, 9 Nov 2020 07:18:01 GMT
- Title: Resolution Switchable Networks for Runtime Efficient Image Recognition
- Authors: Yikai Wang, Fuchun Sun, Duo Li, Anbang Yao
- Abstract summary: We propose a general method to train a single convolutional neural network which is capable of switching image resolutions at inference.
Networks trained with the proposed method are named Resolution Switchable Networks (RS-Nets)
- Score: 46.09537029831355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a general method to train a single convolutional neural network
which is capable of switching image resolutions at inference. Thus the running
speed can be selected to meet various computational resource limits. Networks
trained with the proposed method are named Resolution Switchable Networks
(RS-Nets). The basic training framework shares network parameters for handling
images which differ in resolution, yet keeps separate batch normalization
layers. Though it is parameter-efficient in design, it leads to inconsistent
accuracy variations at different resolutions, for which we provide a detailed
analysis from the aspect of the train-test recognition discrepancy. A
multi-resolution ensemble distillation is further designed, where a teacher is
learnt on the fly as a weighted ensemble over resolutions. Thanks to the
ensemble and knowledge distillation, RS-Nets enjoy accuracy improvements at a
wide range of resolutions compared with individually trained models. Extensive
experiments on the ImageNet dataset are provided, and we additionally consider
quantization problems. Code and models are available at
https://github.com/yikaiw/RS-Nets.
Related papers
- ResFormer: Scaling ViTs with Multi-Resolution Training [100.01406895070693]
We introduce ResFormer, a framework for improved performance on a wide spectrum of, mostly unseen, testing resolutions.
In particular, ResFormer operates on replicated images of different resolutions and enforces a scale consistency loss to engage interactive information across different scales.
We demonstrate, moreover, ResFormer is flexible and can be easily extended to semantic segmentation, object detection and video action recognition.
arXiv Detail & Related papers (2022-12-01T18:57:20Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Any-resolution Training for High-resolution Image Synthesis [55.19874755679901]
Generative models operate at fixed resolution, even though natural images come in a variety of sizes.
We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions.
We introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions.
arXiv Detail & Related papers (2022-04-14T17:59:31Z) - Dynamic Resolution Network [40.64164953983429]
The redundancy on the input resolution of modern CNNs has not been fully investigated.
We propose a novel dynamic-resolution network (DRNet) in which the resolution is determined dynamically based on each input sample.
DRNet achieves similar performance with an about 34% reduction, while gains 1.4% accuracy increase with 10% reduction compared to the original ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-05T13:48:33Z) - Differentiable Patch Selection for Image Recognition [37.11810982945019]
We propose a differentiable Top-K operator to select the most relevant parts of the input to process high resolution images.
We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations.
arXiv Detail & Related papers (2021-04-07T11:15:51Z) - Deep Iterative Residual Convolutional Network for Single Image
Super-Resolution [31.934084942626257]
We propose a deep Iterative Super-Resolution Residual Convolutional Network (ISRResCNet)
It exploits the powerful image regularization and large-scale optimization techniques by training the deep network in an iterative manner with a residual learning approach.
Our method with a few trainable parameters improves the results for different scaling factors in comparison with the state-of-art methods.
arXiv Detail & Related papers (2020-09-07T12:54:14Z) - Cascade Convolutional Neural Network for Image Super-Resolution [15.650515790147189]
We propose a cascaded convolution neural network for image super-resolution (CSRCNN)
Images of different scales can be trained simultaneously and the learned network can make full use of the information resided in different scales of images.
arXiv Detail & Related papers (2020-08-24T11:34:03Z) - Learning to Learn Parameterized Classification Networks for Scalable
Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change.
We employ meta learners to generate convolutional weights of main networks for various input scales.
We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.