Related papers: Image Super-Resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features

Image Super-Resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features

URL: http://arxiv.org/abs/2401.00241v5
Date: Thu, 18 Sep 2025 06:05:49 GMT
Title: Image Super-Resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features
Authors: Yuming Huang, Yingpin Chen, Changhui Wu, Binhui Song, Hui Wang,
Abstract summary: An enhanced Swin Transformer network (ESTN) alternately aggregates local and global features.<n>ESTN achieves higher average PSNR, surpassing SRCNN, ELAN-light, SwinIR-light, and SMFANER+ models.
Score: 4.716053949549313
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Swin Transformer image super-resolution (SR) reconstruction network primarily depends on the long-range relationship of the window and shifted window attention to explore features. However, this approach focuses only on global features, ignoring local ones, and considers only spatial interactions, disregarding channel and spatial-channel feature interactions, limiting its nonlinear mapping capability. Therefore, this study proposes an enhanced Swin Transformer network (ESTN) that alternately aggregates local and global features. During local feature aggregation, shift convolution facilitates the interaction between local spatial and channel information. During global feature aggregation, a block sparse global perception module is introduced, wherein spatial information is reorganized and the recombined features are then processed by a dense layer to achieve global perception. Additionally, multiscale self-attention and low-parameter residual channel attention modules are introduced to aggregate information across different scales. Finally, the effectiveness of ESTN on five public datasets and a local attribution map (LAM) are analyzed. Experimental results demonstrate that the proposed ESTN achieves higher average PSNR, surpassing SRCNN, ELAN-light, SwinIR-light, and SMFANER+ models by 2.17dB, 0.13dB, 0.12dB, and 0.1dB, respectively, with LAM further confirming its larger receptive field. ESTN delivers improved quality of SR images. The source code can be found at https://github.com/huangyuming2021/ESTN.

Related papers

UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction [83.48950950780554]
Building extraction from remote sensing images is a challenging task due to the complex structure variations of buildings.<n>Existing methods employ convolutional or self-attention blocks to capture the multi-scale features in the segmentation models.<n>We present an Uncertainty-Aggregated Global-Local Fusion Network (UAGLNet) to exploit high-quality global-local visual semantics.
arXiv Detail & Related papers (2025-12-15T02:59:16Z)
GCRPNet: Graph-Enhanced Contextual and Regional Perception Network for Salient Object Detection in Optical Remote Sensing Images [68.33481681452675]
We propose a graph-enhanced contextual and regional perception network (GCRPNet)<n>It builds upon the Mamba architecture to simultaneously capture long-range dependencies and enhance regional feature representation.<n>It performs adaptive patch scanning on feature maps processed via multi-scale convolutions, thereby capturing rich local region information.
arXiv Detail & Related papers (2025-08-14T11:31:43Z)
A Novel Shape Guided Transformer Network for Instance Segmentation in Remote Sensing Images [4.14360329494344]
We propose a novel Shape Guided Transformer Network (SGTN) to accurately extract objects at the instance level.<n>Inspired by the global contextual modeling capacity of the self-attention mechanism, we propose an effective transformer encoder termed LSwin.<n>Our SGTN achieves the highest average precision (AP) scores on two single-class public datasets.
arXiv Detail & Related papers (2024-12-31T09:25:41Z)
United Domain Cognition Network for Salient Object Detection in Optical Remote Sensing Images [21.76732661032257]
We propose a novel United Domain Cognition Network (UDCNet) to jointly explore the global-local information in the frequency and spatial domains. Experimental results demonstrate the superiority of the proposed UDCNet over 24 state-of-the-art models.
arXiv Detail & Related papers (2024-11-11T04:12:27Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
Relating CNN-Transformer Fusion Network for Change Detection [23.025190360146635]
RCTNet introduces an early fusion backbone to exploit both spatial and temporal features. Experiments demonstrate RCTNet's clear superiority over traditional RS image CD methods.
arXiv Detail & Related papers (2024-07-03T14:58:40Z)
Salient Object Detection in Optical Remote Sensing Images Driven by Transformer [69.22039680783124]
We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD) Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies. Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
arXiv Detail & Related papers (2023-09-15T07:14:43Z)
MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing Images [7.764449276074902]
We propose a hybrid network based on multi-scale CNN-transformer structure, termed MCTNet. We show that our MCTNet achieves better detection performance than existing state-of-the-art CD methods.
arXiv Detail & Related papers (2022-10-14T07:54:28Z)
Transformer-Guided Convolutional Neural Network for Cross-View Geolocalization [20.435023745201878]
We propose a novel Transformer-guided convolutional neural network (TransGCNN) architecture. Our TransGCNN consists of a CNN backbone extracting feature map from an input image and a Transformer head modeling global context. Experiments on popular benchmark datasets demonstrate that our model achieves top-1 accuracy of 94.12% and 84.92% on CVUSA and CVACT_val, respectively.
arXiv Detail & Related papers (2022-04-21T08:46:41Z)
Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information [15.32353270625554]
Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images. We first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels. Experiments on public datasets strongly demonstrate the state-of-the-art performance of GaLR methods on the RSCTIR task.
arXiv Detail & Related papers (2022-04-21T03:18:09Z)
Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning. We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth) It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z)
LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization [38.376238216214524]
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels. We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
arXiv Detail & Related papers (2021-12-10T01:48:40Z)
Over-and-Under Complete Convolutional RNN for MRI Reconstruction [57.95363471940937]
Recent deep learning-based methods for MR image reconstruction usually leverage a generic auto-encoder architecture. We propose an Over-and-Under Complete Convolu?tional Recurrent Neural Network (OUCR), which consists of an overcomplete and an undercomplete Convolutional Recurrent Neural Network(CRNN) The proposed method achieves significant improvements over the compressed sensing and popular deep learning-based methods with less number of trainable parameters.
arXiv Detail & Related papers (2021-06-16T15:56:34Z)
Conformer: Local Features Coupling Global Representations for Visual Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z)
GAttANet: Global attention agreement for convolutional neural networks [0.0]
Transformer attention architectures, similar to those developed for natural language processing, have recently proved efficient also in vision. Here, we report experiments with a simple such attention system that can improve the performance of standard convolutional networks. We demonstrate the usefulness of this brain-inspired Global Attention Agreement network for various convolutional backbones.
arXiv Detail & Related papers (2021-04-12T15:45:10Z)
Volumetric Transformer Networks [88.85542905676712]
We introduce a learnable module, the volumetric transformer network (VTN) VTN predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely. Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.
arXiv Detail & Related papers (2020-07-18T14:00:12Z)
Global Context-Aware Progressive Aggregation Network for Salient Object Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features. We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
Dense Residual Network: Enhancing Global Dense Feature Flow for Character Recognition [75.4027660840568]
This paper explores how to enhance the local and global dense feature flow by exploiting hierarchical features fully from all the convolution layers. Technically, we propose an efficient and effective CNN framework, i.e., Fast Dense Residual Network (FDRN) for text recognition.
arXiv Detail & Related papers (2020-01-23T06:55:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.