Degradation-Aware Self-Attention Based Transformer for Blind Image
Super-Resolution
- URL: http://arxiv.org/abs/2310.04180v1
- Date: Fri, 6 Oct 2023 11:52:31 GMT
- Title: Degradation-Aware Self-Attention Based Transformer for Blind Image
Super-Resolution
- Authors: Qingguo Liu, Pan Gao, Kang Han, Ningzhong Liu, Wei Xiang
- Abstract summary: We propose a degradation-aware self-attention-based Transformer model for learning the degradation representations of input images with unknown noise.
We apply our proposed model to several popular large-scale benchmark datasets for testing, and achieve the state-of-the-art performance.
Our method yields a PSNR of 32.43 dB on the Urban100 dataset at $times$2 scale, 0.94 dB higher than DASR, and 26.62 dB on the Urban100 dataset at $times$4 scale, 0.26 dB improvement over KDSR.
- Score: 23.336576280389608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared to CNN-based methods, Transformer-based methods achieve impressive
image restoration outcomes due to their abilities to model remote dependencies.
However, how to apply Transformer-based methods to the field of blind
super-resolution (SR) and further make an SR network adaptive to degradation
information is still an open problem. In this paper, we propose a new
degradation-aware self-attention-based Transformer model, where we incorporate
contrastive learning into the Transformer network for learning the degradation
representations of input images with unknown noise. In particular, we integrate
both CNN and Transformer components into the SR network, where we first use the
CNN modulated by the degradation information to extract local features, and
then employ the degradation-aware Transformer to extract global semantic
features. We apply our proposed model to several popular large-scale benchmark
datasets for testing, and achieve the state-of-the-art performance compared to
existing methods. In particular, our method yields a PSNR of 32.43 dB on the
Urban100 dataset at $\times$2 scale, 0.94 dB higher than DASR, and 26.62 dB on
the Urban100 dataset at $\times$4 scale, 0.26 dB improvement over KDSR, setting
a new benchmark in this area. Source code is available at:
https://github.com/I2-Multimedia-Lab/DSAT/tree/main.
Related papers
- DRCT: Saving Image Super-resolution away from Information Bottleneck [7.765333471208582]
Vision Transformer-based approaches for low-level vision tasks have achieved widespread success.
Dense-residual-connected Transformer (DRCT) is proposed to mitigate the loss of spatial information.
Our approach surpasses state-of-the-art methods on benchmark datasets.
arXiv Detail & Related papers (2024-03-31T15:34:45Z) - Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration [46.96362010335177]
In this paper, we propose HIT, a simple yet effective High-frequency Injected Transformer for image restoration.
Specifically, we design a window-wise injection module (WIM), which incorporates abundant high-frequency details into the feature map, to provide reliable references for restoring high-quality images.
In addition, we introduce a spatial enhancement unit (SEU) to preserve essential spatial relationships that may be lost due to the computations carried out across channel dimensions in the BIM.
arXiv Detail & Related papers (2024-03-30T08:05:00Z) - Resolution Enhancement Processing on Low Quality Images Using Swin
Transformer Based on Interval Dense Connection Strategy [1.5705307898493193]
Transformer-based method has demonstrated remarkable performance for image super-resolution in comparison to the method based on the convolutional neural networks (CNNs)
This research work proposes the Interval Dense Connection Strategy, which connects different blocks according to the newly designed algorithm.
For real-life application, this work applies the lastest version of You Only Look Once (YOLOv8) model and the proposed model to perform object detection and real-life image super-resolution on low-quality images.
arXiv Detail & Related papers (2023-03-16T10:01:12Z) - Magic ELF: Image Deraining Meets Association Learning and Transformer [63.761812092934576]
This paper aims to unify CNN and Transformer to take advantage of their learning merits for image deraining.
A novel multi-input attention module (MAM) is proposed to associate rain removal and background recovery.
Our proposed method (dubbed as ELF) outperforms the state-of-the-art approach (MPRNet) by 0.25 dB on average.
arXiv Detail & Related papers (2022-07-21T12:50:54Z) - Lightweight Bimodal Network for Single-Image Super-Resolution via
Symmetric CNN and Recursive Transformer [27.51790638626891]
Single-image super-resolution (SISR) has achieved significant breakthroughs with the development of deep learning.
To solve this issue, we propose a Lightweight Bimodal Network (LBNet) for SISR.
Specifically, an effective Symmetric CNN is designed for local feature extraction and coarse image reconstruction.
arXiv Detail & Related papers (2022-04-28T04:43:22Z) - Transformer-Guided Convolutional Neural Network for Cross-View
Geolocalization [20.435023745201878]
We propose a novel Transformer-guided convolutional neural network (TransGCNN) architecture.
Our TransGCNN consists of a CNN backbone extracting feature map from an input image and a Transformer head modeling global context.
Experiments on popular benchmark datasets demonstrate that our model achieves top-1 accuracy of 94.12% and 84.92% on CVUSA and CVACT_val, respectively.
arXiv Detail & Related papers (2022-04-21T08:46:41Z) - GradViT: Gradient Inversion of Vision Transformers [83.54779732309653]
We demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks.
We introduce a method, named GradViT, that optimize random noise into naturally looking images.
We observe unprecedentedly high fidelity and closeness to the original (hidden) data.
arXiv Detail & Related papers (2022-03-22T17:06:07Z) - Towards End-to-End Image Compression and Analysis with Transformers [99.50111380056043]
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application.
We aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer.
Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
arXiv Detail & Related papers (2021-12-17T03:28:14Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Deep Neural Networks are Surprisingly Reversible: A Baseline for
Zero-Shot Inversion [90.65667807498086]
This paper presents a zero-shot direct model inversion framework that recovers the input to the trained model given only the internal representation.
We empirically show that modern classification models on ImageNet can, surprisingly, be inverted, allowing an approximate recovery of the original 224x224px images from a representation after more than 20 layers.
arXiv Detail & Related papers (2021-07-13T18:01:43Z) - Spatiotemporal Transformer for Video-based Person Re-identification [102.58619642363958]
We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting.
We propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains.
The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks.
arXiv Detail & Related papers (2021-03-30T16:19:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.