Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution
- URL: http://arxiv.org/abs/2208.11609v1
- Date: Wed, 24 Aug 2022 15:23:51 GMT
- Title: Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution
- Authors: Ziwei Luo, Youwei Li, Lei Yu, Qi Wu, Zhihong Wen, Haoqiang Fan,
Shuaicheng Liu
- Abstract summary: We propose a simple plain convolution network with a fast nearest convolution module (NCNet), which is NPU-friendly and can perform a reliable super-resolution in real-time.
Our model can be easily deployed on mobile devices with 8-bit quantization and is fully compatible with all major mobile AI accelerators.
Our NCNet is trained and validated on the DIV2K 3x dataset, and the comparison with other efficient SR methods demonstrated that the NCNet can achieve high fidelity SR results while using fewer inference times.
- Score: 36.72750683939934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning-based single image super-resolution (SISR) approaches have
drawn much attention and achieved remarkable success on modern advanced GPUs.
However, most state-of-the-art methods require a huge number of parameters,
memories, and computational resources, which usually show inferior inference
times when applying them to current mobile device CPUs/NPUs. In this paper, we
propose a simple plain convolution network with a fast nearest convolution
module (NCNet), which is NPU-friendly and can perform a reliable
super-resolution in real-time. The proposed nearest convolution has the same
performance as the nearest upsampling but is much faster and more suitable for
Android NNAPI. Our model can be easily deployed on mobile devices with 8-bit
quantization and is fully compatible with all major mobile AI accelerators.
Moreover, we conduct comprehensive experiments on different tensor operations
on a mobile device to illustrate the efficiency of our network architecture.
Our NCNet is trained and validated on the DIV2K 3x dataset, and the comparison
with other efficient SR methods demonstrated that the NCNet can achieve high
fidelity SR results while using fewer inference times. Our codes and pretrained
models are publicly available at \url{https://github.com/Algolzw/NCNet}.
Related papers
- Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators [0.0]
Deep Neural Networks (DNNs) are being developed, trained, and utilized, putting a strain on both advanced and limited devices.
Our solution is to implement em weight block sparsity, which is a structured sparsity that is friendly to hardware.
We will present performance estimates using accurate and complete code generation for AIE2 configuration sets (AMD Versal FPGAs) with Resnet50, Inception V3, and VGG16.
arXiv Detail & Related papers (2024-07-12T17:37:49Z) - Hybrid Pixel-Unshuffled Network for Lightweight Image Super-Resolution [64.54162195322246]
Convolutional neural network (CNN) has achieved great success on image super-resolution (SR)
Most deep CNN-based SR models take massive computations to obtain high performance.
We propose a novel Hybrid Pixel-Unshuffled Network (HPUN) by introducing an efficient and effective downsampling module into the SR task.
arXiv Detail & Related papers (2022-03-16T20:10:41Z) - SwiftSRGAN -- Rethinking Super-Resolution for Efficient and Real-time
Inference [0.0]
We present an architecture that is faster and smaller in terms of its memory footprint.
A real-time super-resolution enables streaming high resolution media content even under poor bandwidth conditions.
arXiv Detail & Related papers (2021-11-29T04:20:15Z) - GhostSR: Learning Ghost Features for Efficient Image Super-Resolution [49.393251361038025]
Single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs.
We propose to use shift operation to generate the redundant features (i.e., Ghost features) of SISR models.
We show that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines.
arXiv Detail & Related papers (2021-01-21T10:09:47Z) - FastSal: a Computationally Efficient Network for Visual Saliency
Prediction [7.742198347952173]
We show that MobileNetV2 makes an excellent backbone for a visual saliency model and can be effective even without a complex decoder.
We also show that knowledge transfer from a more computationally expensive model like DeepGaze II can be achieved via pseudo-labelling an unlabelled dataset.
arXiv Detail & Related papers (2020-08-25T16:32:33Z) - RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks
on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs.
For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic
Circuits [99.59941892183454]
We propose Einsum Networks (EiNets), a novel implementation design for PCs.
At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation.
We show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation.
arXiv Detail & Related papers (2020-04-13T23:09:15Z) - FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution [14.226301825772174]
We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP)
It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information.
We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
arXiv Detail & Related papers (2020-03-09T03:53:57Z) - Performance Aware Convolutional Neural Network Channel Pruning for
Embedded GPUs [6.035819238203187]
We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance.
We also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3x with cuDNN and above 10x with Arm Compute Library and TVM.
arXiv Detail & Related papers (2020-02-20T12:07:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.