Related papers: SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge

SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge

URL: http://arxiv.org/abs/2304.12556v1
Date: Tue, 25 Apr 2023 03:54:58 GMT
Title: SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge
Authors: Ke Chen, Liangyan Li, Huan Liu, Yunzhe Li, Congling Tang and Jun Chen
Abstract summary: We propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration. For the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM.
Score: 27.344004897917515
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stereo Image Super-Resolution (stereoSR) has attracted significant attention in recent years due to the extensive deployment of dual cameras in mobile phones, autonomous vehicles and robots. In this work, we propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration, and the frequency domain knowledge obtained by the Fast Fourier Convolution (FFC). Specifically, to effectively gather global information, we modify the Residual Swin Transformer blocks (RSTBs) in SwinIR by explicitly incorporating the frequency domain knowledge using the FFC and employing the resulting residual Swin Fourier Transformer blocks (RSFTBs) for feature extraction. Besides, for the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM, which achieves highly competitive performance while requiring less computational cost than the state-of-the-art cross-attention modules. Extensive experimental results and ablation studies demonstrate the effectiveness and efficiency of our proposed SwinFSR.

Related papers

FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation [50.9040167152168]
We experimentally quantify the contrast sensitivity function of CNNs and compare it with that of the human visual system. We propose the Wavelet-Guided Spectral Pooling Module (WSPM) to enhance and balance image features across the frequency domain. To further emulate the human visual system, we introduce the Frequency Domain Enhanced Receptive Field Block (FE-RFB) We develop FE-UNet, a model that utilizes SAM2 as its backbone and incorporates Hiera-Large as a pre-trained block.
arXiv Detail & Related papers (2025-02-06T07:24:34Z)
Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR. Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z)
HAT: Hybrid Attention Transformer for Image Restoration [61.74223315807691]
Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising. We propose a new Hybrid Attention Transformer (HAT) to activate more input pixels for better restoration. Our HAT achieves state-of-the-art performance both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-09-11T05:17:55Z)
RFR-WWANet: Weighted Window Attention-Based Recovery Feature Resolution Network for Unsupervised Image Registration [7.446209993071451]
The Swin transformer has attracted attention in medical image analysis due to its computational efficiency and long-range modeling capability. The registration models based on transformers combine multiple voxels into a single semantic token. This merging process limits the transformers to model and generate coarse-grained spatial information. We propose Recovery Feature Resolution Network (RFRNet), which allows the transformer to contribute fine-grained spatial information.
arXiv Detail & Related papers (2023-05-07T09:57:29Z)
Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images [64.84260544255477]
transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution (224x224 pixels) We propose a complex self-attention (CSA) mechanism to model the high-order contextual information with less than half computations of naive SA. By stacking various layers of CSA blocks, we propose the Fourier Complex Transformer (FCT) model to learn global contextual information from VHR aerial images.
arXiv Detail & Related papers (2022-10-28T08:13:33Z)
ITSRN++: Stronger and Better Implicit Transformer Network for Continuous Screen Content Image Super-Resolution [32.441761727608856]
The proposed method achieves state-of-the-art performance for SCI SR (outperforming SwinIR by 0.74 dB for x3 SR) and also works well for natural image SR. We construct a large scale SCI2K dataset to facilitate the research on SCI SR.
arXiv Detail & Related papers (2022-10-17T07:47:34Z)
SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution [1.305100137416611]
We propose SwinFIR to extend SwinIR by replacing Fast Fourier Convolution (FFC) components. Our algorithm achieves the PSNR of 32.83 dB on Manga109 dataset, which is 0.8 dB higher than the state-of-the-art SwinIR method.
arXiv Detail & Related papers (2022-08-24T01:04:47Z)
Residual Swin Transformer Channel Attention Network for Image Demosaicing [3.8073142980733]
Deep neural networks have been widely used in image restoration, and in particular, in demosaicing, attaining significant performance improvement. Inspired by the success of SwinIR, we propose in this paper a novel Swin Transformer-based network for image demosaicing, called RSTCANet.
arXiv Detail & Related papers (2022-04-14T16:45:17Z)
Transformer-based SAR Image Despeckling [53.99620005035804]
We introduce a transformer-based network for SAR image despeckling. The proposed despeckling network comprises of a transformer-based encoder which allows the network to learn global dependencies between different image regions. Experiments show that the proposed method achieves significant improvements over traditional and convolutional neural network-based despeckling methods.
arXiv Detail & Related papers (2022-01-23T20:09:01Z)
Deep Burst Super-Resolution [165.90445859851448]
We propose a novel architecture for the burst super-resolution task. Our network takes multiple noisy RAW images as input, and generates a denoised, super-resolved RGB image as output. In order to enable training and evaluation on real-world data, we additionally introduce the BurstSR dataset.
arXiv Detail & Related papers (2021-01-26T18:57:21Z)
Frequency Consistent Adaptation for Real World Super Resolution [64.91914552787668]
We propose a novel Frequency Consistent Adaptation (FCA) that ensures the frequency domain consistency when applying Super-Resolution (SR) methods to the real scene. We estimate degradation kernels from unsupervised images and generate the corresponding Low-Resolution (LR) images. Based on the domain-consistent LR-HR pairs, we train easy-implemented Convolutional Neural Network (CNN) SR models.
arXiv Detail & Related papers (2020-12-18T08:25:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.