SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and
Improved Training for Image Super-Resolution
- URL: http://arxiv.org/abs/2208.11247v3
- Date: Sun, 24 Sep 2023 14:25:02 GMT
- Title: SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and
Improved Training for Image Super-Resolution
- Authors: Dafeng Zhang, Feiyu Huang, Shizhuo Liu, Xiaobing Wang, Zhezhu Jin
- Abstract summary: We propose SwinFIR to extend SwinIR by replacing Fast Fourier Convolution (FFC) components.
Our algorithm achieves the PSNR of 32.83 dB on Manga109 dataset, which is 0.8 dB higher than the state-of-the-art SwinIR method.
- Score: 1.305100137416611
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based methods have achieved impressive image restoration
performance due to their capacities to model long-range dependency compared to
CNN-based methods. However, advances like SwinIR adopts the window-based and
local attention strategy to balance the performance and computational overhead,
which restricts employing large receptive fields to capture global information
and establish long dependencies in the early layers. To further improve the
efficiency of capturing global information, in this work, we propose SwinFIR to
extend SwinIR by replacing Fast Fourier Convolution (FFC) components, which
have the image-wide receptive field. We also revisit other advanced techniques,
i.e, data augmentation, pre-training, and feature ensemble to improve the
effect of image reconstruction. And our feature ensemble method enables the
performance of the model to be considerably enhanced without increasing the
training and testing time. We applied our algorithm on multiple popular
large-scale benchmarks and achieved state-of-the-art performance comparing to
the existing methods. For example, our SwinFIR achieves the PSNR of 32.83 dB on
Manga109 dataset, which is 0.8 dB higher than the state-of-the-art SwinIR
method.
Related papers
- Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR)
In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks.
We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z) - LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation [64.34935748707673]
Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors.
We propose a novel method of Learning Resampling (termed LeRF) which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption.
LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the shapes of these resampling functions with a neural network.
arXiv Detail & Related papers (2024-07-13T16:09:45Z) - HAT: Hybrid Attention Transformer for Image Restoration [61.74223315807691]
Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising.
We propose a new Hybrid Attention Transformer (HAT) to activate more input pixels for better restoration.
Our HAT achieves state-of-the-art performance both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-09-11T05:17:55Z) - RBSR: Efficient and Flexible Recurrent Network for Burst
Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images.
In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z) - SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain
Knowledge [27.344004897917515]
We propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration.
For the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM.
arXiv Detail & Related papers (2023-04-25T03:54:58Z) - Resolution Enhancement Processing on Low Quality Images Using Swin
Transformer Based on Interval Dense Connection Strategy [1.5705307898493193]
Transformer-based method has demonstrated remarkable performance for image super-resolution in comparison to the method based on the convolutional neural networks (CNNs)
This research work proposes the Interval Dense Connection Strategy, which connects different blocks according to the newly designed algorithm.
For real-life application, this work applies the lastest version of You Only Look Once (YOLOv8) model and the proposed model to perform object detection and real-life image super-resolution on low-quality images.
arXiv Detail & Related papers (2023-03-16T10:01:12Z) - Dynamic Test-Time Augmentation via Differentiable Functions [3.686808512438363]
DynTTA is an image enhancement method that generates recognition-friendly images without retraining the recognition model.
DynTTA is based on differentiable data augmentation techniques and generates a blended image from many augmented images to improve the recognition accuracy under distribution shifts.
arXiv Detail & Related papers (2022-12-09T06:06:47Z) - SAGE: Saliency-Guided Mixup with Optimal Rearrangements [22.112463794733188]
Saliency-Guided Mixup with Optimal Rearrangements (SAGE)
SAGE creates new training examples by rearranging and mixing image pairs using visual saliency as guidance.
We demonstrate on CIFAR-10 and CIFAR-100 that SAGE achieves better or comparable performance to the state of the art while being more efficient.
arXiv Detail & Related papers (2022-10-31T19:45:21Z) - Contextual Learning in Fourier Complex Field for VHR Remote Sensing
Images [64.84260544255477]
transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution (224x224 pixels)
We propose a complex self-attention (CSA) mechanism to model the high-order contextual information with less than half computations of naive SA.
By stacking various layers of CSA blocks, we propose the Fourier Complex Transformer (FCT) model to learn global contextual information from VHR aerial images.
arXiv Detail & Related papers (2022-10-28T08:13:33Z) - FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning [64.32306537419498]
We propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations.
These transformations also use information from both within-class and across-class representations that we extract through clustering.
We demonstrate that our method is comparable to current state of art for smaller datasets while being able to scale up to larger datasets.
arXiv Detail & Related papers (2020-07-16T17:55:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.