Efficient Long-Range Attention Network for Image Super-resolution
- URL: http://arxiv.org/abs/2203.06697v1
- Date: Sun, 13 Mar 2022 16:17:48 GMT
- Title: Efficient Long-Range Attention Network for Image Super-resolution
- Authors: Xindong Zhang, Hui Zeng, Shi Guo, Lei Zhang
- Abstract summary: We propose an efficient long-range attention network (ELAN) for image super-resolution (SR)
We first employ shift convolution (shift-conv) to effectively extract the image local structural information while maintaining the same level of complexity as 1x1 convolution.
A highly efficient long-range attention block (ELAB) is then built by simply cascading two shift-conv with a GMSA module.
- Score: 25.51377161557467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, transformer-based methods have demonstrated impressive results in
various vision tasks, including image super-resolution (SR), by exploiting the
self-attention (SA) for feature extraction. However, the computation of SA in
most existing transformer based models is very expensive, while some employed
operations may be redundant for the SR task. This limits the range of SA
computation and consequently the SR performance. In this work, we propose an
efficient long-range attention network (ELAN) for image SR. Specifically, we
first employ shift convolution (shift-conv) to effectively extract the image
local structural information while maintaining the same level of complexity as
1x1 convolution, then propose a group-wise multi-scale self-attention (GMSA)
module, which calculates SA on non-overlapped groups of features using
different window sizes to exploit the long-range image dependency. A highly
efficient long-range attention block (ELAB) is then built by simply cascading
two shift-conv with a GMSA module, which is further accelerated by using a
shared attention mechanism. Without bells and whistles, our ELAN follows a
fairly simple design by sequentially cascading the ELABs. Extensive experiments
demonstrate that ELAN obtains even better results against the transformer-based
SR models but with significantly less complexity. The source code can be found
at https://github.com/xindongzhang/ELAN.
Related papers
- Task-Aware Dynamic Transformer for Efficient Arbitrary-Scale Image Super-Resolution [8.78015409192613]
Arbitrary-scale super-resolution (ASSR) aims to learn a single model for image super-resolution at arbitrary magnifying scales.
Existing ASSR networks typically comprise an off-the-shelf scale-agnostic feature extractor and an arbitrary scale upsampler.
We propose a Task-Aware Dynamic Transformer (TADT) as an input-adaptive feature extractor for efficient image ASSR.
arXiv Detail & Related papers (2024-08-16T13:35:52Z) - Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Lightweight Structure-aware Transformer Network for VHR Remote Sensing
Image Change Detection [15.391216316828354]
This Letter proposes a Lightweight Structure-aware Transformer (LSAT) network for RS image CD.
First, a Cross-dimension Interactive Self-attention (CISA) module with linear complexity is designed to replace the vanilla self-attention in visual Transformer.
Second, a Structure-aware Enhancement Module (SAEM) is designed to enhance difference features and edge detail information.
arXiv Detail & Related papers (2023-06-03T03:21:18Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - ShuffleMixer: An Efficient ConvNet for Image Super-Resolution [88.86376017828773]
We propose ShuffleMixer, for lightweight image super-resolution that explores large convolution and channel split-shuffle operation.
Specifically, we develop a large depth-wise convolution and two projection layers based on channel splitting and shuffling as the basic component to mix features efficiently.
Experimental results demonstrate that the proposed ShuffleMixer is about 6x smaller than the state-of-the-art methods in terms of model parameters and FLOPs.
arXiv Detail & Related papers (2022-05-30T15:26:52Z) - Self-Calibrated Efficient Transformer for Lightweight Super-Resolution [21.63691922827879]
We present a lightweight Self-Calibrated Efficient Transformer (SCET) network to solve this problem.
The architecture of SCET mainly consists of the self-calibrated module and efficient transformer block.
We provide comprehensive results on different settings of the overall network.
arXiv Detail & Related papers (2022-04-19T14:20:32Z) - Scale-Aware Dynamic Network for Continuous-Scale Super-Resolution [16.67263192454279]
We propose a scale-aware dynamic network (SADN) for continuous-scale SR.
First, we propose a scale-aware dynamic convolutional (SAD-Conv) layer for the feature learning of multiple SR tasks with various scales.
Second, we devise a continuous-scale upsampling module (CSUM) with the multi-bilinear local implicit function (MBLIF) for any-scale upsampling.
arXiv Detail & Related papers (2021-10-29T09:57:48Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - Scalable Visual Transformers with Hierarchical Pooling [61.05787583247392]
We propose a Hierarchical Visual Transformer (HVT) which progressively pools visual tokens to shrink the sequence length.
It brings a great benefit by scaling dimensions of depth/width/resolution/patch size without introducing extra computational complexity.
Our HVT outperforms the competitive baselines on ImageNet and CIFAR-100 datasets.
arXiv Detail & Related papers (2021-03-19T03:55:58Z) - Lightweight Single-Image Super-Resolution Network with Attentive
Auxiliary Feature Learning [73.75457731689858]
We develop a computation efficient yet accurate network based on the proposed attentive auxiliary features (A$2$F) for SISR.
Experimental results on large-scale dataset demonstrate the effectiveness of the proposed model against the state-of-the-art (SOTA) SR methods.
arXiv Detail & Related papers (2020-11-13T06:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.