Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution
- URL: http://arxiv.org/abs/2407.16232v2
- Date: Tue, 27 Aug 2024 07:31:37 GMT
- Title: Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution
- Authors: Dinh Phu Tran, Dao Duy Hung, Daeyoung Kim,
- Abstract summary: Window-based attention methods have shown great potential for computer vision tasks, particularly in Single Image Super-Resolution (SISR)
We propose a new Channel-Partitioned Attention Transformer (CPAT) to better capture long-range dependencies by sequentially expanding windows along the height and width of feature maps.
In addition, we propose a novel Spatial-Frequency Interaction Module (SFIM), which incorporates information from spatial and frequency domains to provide a more comprehensive information from feature maps.
- Score: 1.8506868409351092
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, window-based attention methods have shown great potential for computer vision tasks, particularly in Single Image Super-Resolution (SISR). However, it may fall short in capturing long-range dependencies and relationships between distant tokens. Additionally, we find that learning on spatial domain does not convey the frequency content of the image, which is a crucial aspect in SISR. To tackle these issues, we propose a new Channel-Partitioned Attention Transformer (CPAT) to better capture long-range dependencies by sequentially expanding windows along the height and width of feature maps. In addition, we propose a novel Spatial-Frequency Interaction Module (SFIM), which incorporates information from spatial and frequency domains to provide a more comprehensive information from feature maps. This includes information about the frequency content and enhances the receptive field across the entire image. Experimental findings show the effectiveness of our proposed modules and architecture. In particular, CPAT surpasses current state-of-the-art methods by up to 0.31dB at x2 SR on Urban100.
Related papers
- ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer [3.686808512438363]
This work proposes a transformer-based super-resolution architecture called ML-CrAIST.
We operate spatial and channel self-attention, which concurrently model pixel interaction from both spatial and channel dimensions.
We devise a cross-attention block for super-resolution, which explores the correlations between low and high-frequency information.
arXiv Detail & Related papers (2024-08-19T12:23:15Z) - Exploring Richer and More Accurate Information via Frequency Selection for Image Restoration [0.0]
We introduce a multi-scale frequency selection network (MSFSNet) that seamlessly integrates spatial and frequency domain knowledge.
Our MSFSNet achieves performance that is either superior or comparable to state-of-the-art algorithms.
arXiv Detail & Related papers (2024-07-12T03:10:08Z) - An Advanced Features Extraction Module for Remote Sensing Image Super-Resolution [0.5461938536945723]
We propose an advanced feature extraction module called Channel and Spatial Attention Feature Extraction (CSA-FE)
Our proposed method helps the model focus on the specific channels and spatial locations containing high-frequency information so that the model can focus on relevant features and suppress irrelevant ones.
Our model achieved superior performance compared to various existing models.
arXiv Detail & Related papers (2024-05-07T18:15:51Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Spatial-Frequency Attention for Image Denoising [22.993509525990998]
We propose the spatial-frequency attention network (SFANet) to enhance the network's ability in exploiting long-range dependency.
Experiments on multiple denoising benchmarks demonstrate the leading performance of SFANet network.
arXiv Detail & Related papers (2023-02-27T09:07:15Z) - SufrinNet: Toward Sufficient Cross-View Interaction for Stereo Image
Enhancement in The Dark [119.01585302856103]
Low-light stereo image enhancement (LLSIE) is a relatively new task to enhance the quality of visually unpleasant stereo images captured in dark conditions.
Current methods clearly suffer from two shortages: 1) insufficient cross-view interaction; 2) lacking long-range dependency for intra-view learning.
We propose a novel LLSIE model, termed underlineSufficient Cunderlineross-View underlineInteraction Network (SufrinNet)
arXiv Detail & Related papers (2022-11-02T04:01:30Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Deep Burst Super-Resolution [165.90445859851448]
We propose a novel architecture for the burst super-resolution task.
Our network takes multiple noisy RAW images as input, and generates a denoised, super-resolved RGB image as output.
In order to enable training and evaluation on real-world data, we additionally introduce the BurstSR dataset.
arXiv Detail & Related papers (2021-01-26T18:57:21Z) - Multi-Attention-Network for Semantic Segmentation of Fine Resolution
Remote Sensing Images [10.835342317692884]
The accuracy of semantic segmentation in remote sensing images has been increased significantly by deep convolutional neural networks.
This paper proposes a Multi-Attention-Network (MANet) to address these issues.
A novel attention mechanism of kernel attention with linear complexity is proposed to alleviate the large computational demand in attention.
arXiv Detail & Related papers (2020-09-03T09:08:02Z) - Co-Saliency Spatio-Temporal Interaction Network for Person
Re-Identification in Videos [85.6430597108455]
We propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.
It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions.
Multiple spatialtemporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation.
arXiv Detail & Related papers (2020-04-10T10:23:58Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.