Image Super-resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features
- URL: http://arxiv.org/abs/2401.00241v4
- Date: Sat, 6 Apr 2024 03:24:26 GMT
- Title: Image Super-resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features
- Authors: Yuming Huang, Yingpin Chen, Changhui Wu, Hanrong Xie, Binhui Song, Hui Wang,
- Abstract summary: This paper proposes enhanced Swin Transformer modules via alternating aggregation of local-global features.
The experimental results show that the proposed network outperforms the other state-of-the-art super-resolution networks.
- Score: 6.274236785783168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Swin Transformer image super-resolution reconstruction network only relies on the long-range relationship of window attention and shifted window attention to explore features. This mechanism has two limitations. On the one hand, it only focuses on global features while ignoring local features. On the other hand, it is only concerned with spatial feature interactions while ignoring channel features and channel interactions, thus limiting its non-linear mapping ability. To address the above limitations, this paper proposes enhanced Swin Transformer modules via alternating aggregation of local-global features. In the local feature aggregation stage, we introduce a shift convolution to realize the interaction between local spatial information and channel information. Then, a block sparse global perception module is introduced in the global feature aggregation stage. In this module, we reorganize the spatial information first, then send the recombination information into a dense layer to implement the global perception. After that, a multi-scale self-attention module and a low-parameter residual channel attention module are introduced to realize information aggregation at different scales. Finally, the proposed network is validated on five publicly available datasets. The experimental results show that the proposed network outperforms the other state-of-the-art super-resolution networks.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Relating CNN-Transformer Fusion Network for Change Detection [23.025190360146635]
RCTNet introduces an early fusion backbone to exploit both spatial and temporal features.
Experiments demonstrate RCTNet's clear superiority over traditional RS image CD methods.
arXiv Detail & Related papers (2024-07-03T14:58:40Z) - Salient Object Detection in Optical Remote Sensing Images Driven by
Transformer [69.22039680783124]
We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD)
Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies.
Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
arXiv Detail & Related papers (2023-09-15T07:14:43Z) - MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in
Optical Remote Sensing Images [7.764449276074902]
We propose a hybrid network based on multi-scale CNN-transformer structure, termed MCTNet.
We show that our MCTNet achieves better detection performance than existing state-of-the-art CD methods.
arXiv Detail & Related papers (2022-10-14T07:54:28Z) - Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth)
It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z) - LCTR: On Awakening the Local Continuity of Transformer for Weakly
Supervised Object Localization [38.376238216214524]
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels.
We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
arXiv Detail & Related papers (2021-12-10T01:48:40Z) - Conformer: Local Features Coupling Global Representations for Visual
Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z) - GAttANet: Global attention agreement for convolutional neural networks [0.0]
Transformer attention architectures, similar to those developed for natural language processing, have recently proved efficient also in vision.
Here, we report experiments with a simple such attention system that can improve the performance of standard convolutional networks.
We demonstrate the usefulness of this brain-inspired Global Attention Agreement network for various convolutional backbones.
arXiv Detail & Related papers (2021-04-12T15:45:10Z) - Volumetric Transformer Networks [88.85542905676712]
We introduce a learnable module, the volumetric transformer network (VTN)
VTN predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely.
Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.
arXiv Detail & Related papers (2020-07-18T14:00:12Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.