Restormer: Efficient Transformer for High-Resolution Image Restoration
- URL: http://arxiv.org/abs/2111.09881v1
- Date: Thu, 18 Nov 2021 18:59:10 GMT
- Title: Restormer: Efficient Transformer for High-Resolution Image Restoration
- Authors: Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad
Shahbaz Khan, Ming-Hsuan Yang
- Abstract summary: convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
- Score: 118.9617735769827
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Since convolutional neural networks (CNNs) perform well at learning
generalizable image priors from large-scale data, these models have been
extensively applied to image restoration and related tasks. Recently, another
class of neural architectures, Transformers, have shown significant performance
gains on natural language and high-level vision tasks. While the Transformer
model mitigates the shortcomings of CNNs (i.e., limited receptive field and
inadaptability to input content), its computational complexity grows
quadratically with the spatial resolution, therefore making it infeasible to
apply to most image restoration tasks involving high-resolution images. In this
work, we propose an efficient Transformer model by making several key designs
in the building blocks (multi-head attention and feed-forward network) such
that it can capture long-range pixel interactions, while still remaining
applicable to large images. Our model, named Restoration Transformer
(Restormer), achieves state-of-the-art results on several image restoration
tasks, including image deraining, single-image motion deblurring, defocus
deblurring (single-image and dual-pixel data), and image denoising (Gaussian
grayscale/color denoising, and real image denoising). The source code and
pre-trained models are available at https://github.com/swz30/Restormer.
Related papers
- Joint multi-dimensional dynamic attention and transformer for general image restoration [14.987034136856463]
outdoor images often suffer from severe degradation due to rain, haze, and noise.
Current image restoration methods struggle to handle complex degradation while maintaining efficiency.
This paper introduces a novel image restoration architecture that combines multi-dimensional dynamic attention and self-attention.
arXiv Detail & Related papers (2024-11-12T15:58:09Z) - Multi-Scale Representation Learning for Image Restoration with State-Space Model [13.622411683295686]
We propose a novel Multi-Scale State-Space Model-based (MS-Mamba) for efficient image restoration.
Our proposed method achieves new state-of-the-art performance while maintaining low computational complexity.
arXiv Detail & Related papers (2024-08-19T16:42:58Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting [2.3014300466616078]
This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture -- WavePaint.
It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers.
Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator.
arXiv Detail & Related papers (2023-07-01T18:41:34Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Incremental Transformer Structure Enhanced Image Inpainting with Masking
Positional Encoding [38.014569953980754]
The proposed model restores holistic image structures with a powerful attention-based transformer model in a fixed low-resolution sketch space.
Our model can be integrated with other pretrained inpainting models efficiently with the zero-d residual addition.
arXiv Detail & Related papers (2022-03-02T04:27:27Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.