DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks
for Image Super-Resolution
- URL: http://arxiv.org/abs/2301.02031v1
- Date: Thu, 5 Jan 2023 12:06:47 GMT
- Title: DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks
for Image Super-Resolution
- Authors: Xiang Li, Jinshan Pan, Jinhui Tang, and Jiangxin Dong
- Abstract summary: We propose an effective lightweight dynamic local and global self-attention network (DLGSANet) to solve image super-resolution.
Motivated by the network designs of Transformers, we develop a simple yet effective multi-head dynamic local self-attention (MHDLSA) module to extract local features efficiently.
To overcome this problem, we develop a sparse global self-attention (SparseGSA) module to select the most useful similarity values.
- Score: 83.47467223117361
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an effective lightweight dynamic local and global self-attention
network (DLGSANet) to solve image super-resolution. Our method explores the
properties of Transformers while having low computational costs. Motivated by
the network designs of Transformers, we develop a simple yet effective
multi-head dynamic local self-attention (MHDLSA) module to extract local
features efficiently. In addition, we note that existing Transformers usually
explore all similarities of the tokens between the queries and keys for the
feature aggregation. However, not all the tokens from the queries are relevant
to those in keys, using all the similarities does not effectively facilitate
the high-resolution image reconstruction. To overcome this problem, we develop
a sparse global self-attention (SparseGSA) module to select the most useful
similarity values so that the most useful global features can be better
utilized for the high-resolution image reconstruction. We develop a hybrid
dynamic-Transformer block(HDTB) that integrates the MHDLSA and SparseGSA for
both local and global feature exploration. To ease the network training, we
formulate the HDTBs into a residual hybrid dynamic-Transformer group (RHDTG).
By embedding the RHDTGs into an end-to-end trainable network, we show that our
proposed method has fewer network parameters and lower computational costs
while achieving competitive performance against state-of-the-art ones in terms
of accuracy. More information is available at
https://neonleexiang.github.io/DLGSANet/
Related papers
- ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - Learning A Sparse Transformer Network for Effective Image Deraining [42.01684644627124]
We propose an effective DeRaining network, Sparse Transformer (DRSformer)
We develop a learnable top-k selection operator to adaptively retain the most crucial attention scores from the keys for each query for better feature aggregation.
We equip our model with mixture of experts feature compensator to present a cooperation refinement deraining scheme.
arXiv Detail & Related papers (2023-03-21T15:41:57Z) - Recursive Generalization Transformer for Image Super-Resolution [108.67898547357127]
We propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images.
We combine the RG-SA with local self-attention to enhance the exploitation of the global context.
Our RGT outperforms recent state-of-the-art methods quantitatively and qualitatively.
arXiv Detail & Related papers (2023-03-11T10:44:44Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and
Local Information [15.32353270625554]
Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images.
We first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels.
Experiments on public datasets strongly demonstrate the state-of-the-art performance of GaLR methods on the RSCTIR task.
arXiv Detail & Related papers (2022-04-21T03:18:09Z) - Self-Calibrated Efficient Transformer for Lightweight Super-Resolution [21.63691922827879]
We present a lightweight Self-Calibrated Efficient Transformer (SCET) network to solve this problem.
The architecture of SCET mainly consists of the self-calibrated module and efficient transformer block.
We provide comprehensive results on different settings of the overall network.
arXiv Detail & Related papers (2022-04-19T14:20:32Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Conformer: Local Features Coupling Global Representations for Visual
Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.