Efficient Transformer for Single Image Super-Resolution
- URL: http://arxiv.org/abs/2108.11084v1
- Date: Wed, 25 Aug 2021 07:05:30 GMT
- Title: Efficient Transformer for Single Image Super-Resolution
- Authors: Zhisheng Lu, Hong Liu, Juncheng Li, and Linlin Zhang
- Abstract summary: We propose a novel Efficient Super-Resolution Transformer (ESRT) for fast and accurate image super-resolution.
ESRT is a hybrid Transformer where a CNN-based SR network is first designed in the front to extract deep features.
The proposed ET only occupies 4191M GPU memory with better performance.
- Score: 13.234199307504602
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Single image super-resolution task has witnessed great strides with the
development of deep learning. However, most existing studies focus on building
a more complex neural network with a massive number of layers, bringing heavy
computational cost and memory storage. Recently, as Transformer yields
brilliant results in NLP tasks, more and more researchers start to explore the
application of Transformer in computer vision tasks. But with the heavy
computational cost and high GPU memory occupation of the vision Transformer,
the network can not be designed too deep. To address this problem, we propose a
novel Efficient Super-Resolution Transformer (ESRT) for fast and accurate image
super-resolution. ESRT is a hybrid Transformer where a CNN-based SR network is
first designed in the front to extract deep features. Specifically, there are
two backbones for formatting the ESRT: lightweight CNN backbone (LCB) and
lightweight Transformer backbone (LTB). Among them, LCB is a lightweight SR
network to extract deep SR features at a low computational cost by dynamically
adjusting the size of the feature map. LTB is made up of an efficient
Transformer (ET) with a small GPU memory occupation, which benefited from the
novel efficient multi-head attention (EMHA). In EMHA, a feature split module
(FSM) is proposed to split the long sequence into sub-segments and then these
sub-segments are applied by attention operation. This module can significantly
decrease the GPU memory occupation. Extensive experiments show that our ESRT
achieves competitive results. Compared with the original Transformer which
occupies 16057M GPU memory, the proposed ET only occupies 4191M GPU memory with
better performance.
Related papers
- HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution [70.52256118833583]
We present a strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR)
Specifically, we first replace the commonly used fixed small windows with expanding hierarchical windows to aggregate features at different scales.
Considering the intensive computation required for large windows, we further design a spatial-channel correlation method with linear complexity to window sizes.
arXiv Detail & Related papers (2024-07-08T12:42:10Z) - LIPT: Latency-aware Image Processing Transformer [17.802838753201385]
We present a latency-aware image processing transformer, termed LIPT.
We devise the low-latency proportion LIPT block that substitutes memory-intensive operators with the combination of self-attention and convolutions to achieve practical speedup.
arXiv Detail & Related papers (2024-04-09T07:25:30Z) - MCUFormer: Deploying Vision Transformers on Microcontrollers with
Limited Memory [76.02294791513552]
We propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory.
Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on ImageNet for image classification with 320KB memory.
arXiv Detail & Related papers (2023-10-25T18:00:26Z) - Lightweight Structure-aware Transformer Network for VHR Remote Sensing
Image Change Detection [15.391216316828354]
This Letter proposes a Lightweight Structure-aware Transformer (LSAT) network for RS image CD.
First, a Cross-dimension Interactive Self-attention (CISA) module with linear complexity is designed to replace the vanilla self-attention in visual Transformer.
Second, a Structure-aware Enhancement Module (SAEM) is designed to enhance difference features and edge detail information.
arXiv Detail & Related papers (2023-06-03T03:21:18Z) - Reciprocal Attention Mixing Transformer for Lightweight Image Restoration [6.3159191692241095]
We propose a lightweight image restoration network, Reciprocal Attention Mixing Transformer (RAMiT)
It employs bi-dimensional (spatial and channel) self-attentions in parallel with different numbers of multi-heads.
It achieves state-of-the-art performance on multiple lightweight IR tasks, including super-resolution, color denoising, grayscale denoising, low-light enhancement, and deraining.
arXiv Detail & Related papers (2023-05-19T06:55:04Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - ShuffleMixer: An Efficient ConvNet for Image Super-Resolution [88.86376017828773]
We propose ShuffleMixer, for lightweight image super-resolution that explores large convolution and channel split-shuffle operation.
Specifically, we develop a large depth-wise convolution and two projection layers based on channel splitting and shuffling as the basic component to mix features efficiently.
Experimental results demonstrate that the proposed ShuffleMixer is about 6x smaller than the state-of-the-art methods in terms of model parameters and FLOPs.
arXiv Detail & Related papers (2022-05-30T15:26:52Z) - Lightweight Bimodal Network for Single-Image Super-Resolution via
Symmetric CNN and Recursive Transformer [27.51790638626891]
Single-image super-resolution (SISR) has achieved significant breakthroughs with the development of deep learning.
To solve this issue, we propose a Lightweight Bimodal Network (LBNet) for SISR.
Specifically, an effective Symmetric CNN is designed for local feature extraction and coarse image reconstruction.
arXiv Detail & Related papers (2022-04-28T04:43:22Z) - Self-Calibrated Efficient Transformer for Lightweight Super-Resolution [21.63691922827879]
We present a lightweight Self-Calibrated Efficient Transformer (SCET) network to solve this problem.
The architecture of SCET mainly consists of the self-calibrated module and efficient transformer block.
We provide comprehensive results on different settings of the overall network.
arXiv Detail & Related papers (2022-04-19T14:20:32Z) - Hybrid Pixel-Unshuffled Network for Lightweight Image Super-Resolution [64.54162195322246]
Convolutional neural network (CNN) has achieved great success on image super-resolution (SR)
Most deep CNN-based SR models take massive computations to obtain high performance.
We propose a novel Hybrid Pixel-Unshuffled Network (HPUN) by introducing an efficient and effective downsampling module into the SR task.
arXiv Detail & Related papers (2022-03-16T20:10:41Z) - HRFormer: High-Resolution Transformer for Dense Prediction [99.6060997466614]
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks.
We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet)
We demonstrate the effectiveness of the High-Resolution Transformer on both human pose estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-10-18T15:37:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.