Related papers: Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification

Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification

URL: http://arxiv.org/abs/2010.05300v1
Date: Sun, 11 Oct 2020 17:55:06 GMT
Title: Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification
Authors: Yulin Wang, Kangchen Lv, Rui Huang, Shiji Song, Le Yang, Gao Huang
Abstract summary: Deep convolutional neural networks (CNNs) generally improve when fueled with high resolution images. Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efficient image classification. Our framework is general and flexible as it is compatible with most of the state-of-the-art light-weighted CNNs.
Score: 46.885260723836865
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The accuracy of deep convolutional neural networks (CNNs) generally improves when fueled with high resolution images. However, this often comes at a high computational cost and high memory footprint. Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efficient image classification by processing a sequence of relatively small inputs, which are strategically selected from the original image with reinforcement learning. Such a dynamic decision process naturally facilitates adaptive inference at test time, i.e., it can be terminated once the model is sufficiently confident about its prediction and thus avoids further redundant computation. Notably, our framework is general and flexible as it is compatible with most of the state-of-the-art light-weighted CNNs (such as MobileNets, EfficientNets and RegNets), which can be conveniently deployed as the backbone feature extractor. Experiments on ImageNet show that our method consistently improves the computational efficiency of a wide variety of deep models. For example, it further reduces the average latency of the highly efficient MobileNet-V3 on an iPhone XS Max by 20% without sacrificing accuracy. Code and pre-trained models are available at https://github.com/blackfeather-wang/GFNet-Pytorch.

Related papers

Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation. We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks. Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z)
Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture. To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy. Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z)
T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields. In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting. Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z)
Deep Dynamic Scene Deblurring from Optical Flow [53.625999196063574]
Deblurring can provide visually more pleasant pictures and make photography more convenient. It is difficult to model the non-uniform blur mathematically. We develop a convolutional neural network (CNN) to restore the sharp images from the deblurred features.
arXiv Detail & Related papers (2023-01-18T06:37:21Z)
Glance and Focus Networks for Dynamic Visual Recognition [36.26856080976052]
We formulate the image recognition problem as a sequential coarse-to-fine feature learning process, mimicking the human visual system. The proposed Glance and Focus Network (GFNet) first extracts a quick global representation of the input image at a low resolution scale, and then strategically attends to a series of salient (small) regions to learn finer features. It reduces the average latency of the highly efficient MobileNet-V3 on an iPhone XS Max by 1.3x without sacrificing accuracy.
arXiv Detail & Related papers (2022-01-09T14:00:56Z)
Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers. We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z)
Dynamic Resolution Network [40.64164953983429]
The redundancy on the input resolution of modern CNNs has not been fully investigated. We propose a novel dynamic-resolution network (DRNet) in which the resolution is determined dynamically based on each input sample. DRNet achieves similar performance with an about 34% reduction, while gains 1.4% accuracy increase with 10% reduction compared to the original ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-05T13:48:33Z)
FastSal: a Computationally Efficient Network for Visual Saliency Prediction [7.742198347952173]
We show that MobileNetV2 makes an excellent backbone for a visual saliency model and can be effective even without a complex decoder. We also show that knowledge transfer from a more computationally expensive model like DeepGaze II can be achieved via pseudo-labelling an unlabelled dataset.
arXiv Detail & Related papers (2020-08-25T16:32:33Z)
Lightweight Modules for Efficient Deep Learning based Image Restoration [20.701733377216932]
We propose several lightweight low-level modules which can be used to create a computationally low cost variant of a given baseline model. Our results show that proposed networks consistently output visually similar reconstructions compared to full capacity baselines.
arXiv Detail & Related papers (2020-07-11T19:35:00Z)
Impact of ImageNet Model Selection on Domain Adaptation [26.016647703500883]
We investigate how different ImageNet models affect transfer accuracy on domain adaptation problems. A higher accuracy ImageNet model produces better features, and leads to higher accuracy on domain adaptation problems. We also examine the architecture of each neural network to find the best layer for feature extraction.
arXiv Detail & Related papers (2020-02-06T23:58:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.