MaxSR: Image Super-Resolution Using Improved MaxViT
- URL: http://arxiv.org/abs/2307.07240v1
- Date: Fri, 14 Jul 2023 09:26:47 GMT
- Title: MaxSR: Image Super-Resolution Using Improved MaxViT
- Authors: Bincheng Yang and Gangshan Wu
- Abstract summary: We present a single image super-resolution model based on recent hybrid vision transformer of MaxViT, named as MaxSR.
Our proposed model for classical single image super-resolution (MaxSR) and lightweight single image super-resolution (MaxSR-light) establish new state-of-the-art performance efficiently.
- Score: 34.53995225219387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While transformer models have been demonstrated to be effective for natural
language processing tasks and high-level vision tasks, only a few attempts have
been made to use powerful transformer models for single image super-resolution.
Because transformer models have powerful representation capacity and the
in-built self-attention mechanisms in transformer models help to leverage
self-similarity prior in input low-resolution image to improve performance for
single image super-resolution, we present a single image super-resolution model
based on recent hybrid vision transformer of MaxViT, named as MaxSR. MaxSR
consists of four parts, a shallow feature extraction block, multiple cascaded
adaptive MaxViT blocks to extract deep hierarchical features and model global
self-similarity from low-level features efficiently, a hierarchical feature
fusion block, and finally a reconstruction block. The key component of MaxSR,
i.e., adaptive MaxViT block, is based on MaxViT block which mixes MBConv with
squeeze-and-excitation, block attention and grid attention. In order to achieve
better global modelling of self-similarity in input low-resolution image, we
improve block attention and grid attention in MaxViT block to adaptive block
attention and adaptive grid attention which do self-attention inside each
window across all grids and each grid across all windows respectively in the
most efficient way. We instantiate proposed model for classical single image
super-resolution (MaxSR) and lightweight single image super-resolution
(MaxSR-light). Experiments show that our MaxSR and MaxSR-light establish new
state-of-the-art performance efficiently.
Related papers
- OminiControl: Minimal and Universal Control for Diffusion Transformer [68.3243031301164]
OminiControl is a framework that integrates image conditions into pre-trained Diffusion Transformer (DiT) models.
At its core, OminiControl leverages a parameter reuse mechanism, enabling the DiT to encode image conditions using itself as a powerful backbone.
OminiControl addresses a wide range of image conditioning tasks in a unified manner, including subject-driven generation and spatially-aligned conditions.
arXiv Detail & Related papers (2024-11-22T17:55:15Z) - A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift [6.835244697120131]
We propose TaylorIR to address limitations by utilizing a patch size of 1x1, enabling pixel-level processing in any transformer-based SR model.
Experimental results demonstrate that our approach achieves new state-of-the-art SR performance while reducing memory consumption by up to 60% compared to traditional self-attention-based transformers.
arXiv Detail & Related papers (2024-11-15T14:43:58Z) - Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z) - CFAT: Unleashing TriangularWindows for Image Super-resolution [5.130320840059732]
Transformer-based models have revolutionized the field of image super-resolution (SR)
We propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion.
Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.
arXiv Detail & Related papers (2024-03-24T13:31:31Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - Self-Calibrated Efficient Transformer for Lightweight Super-Resolution [21.63691922827879]
We present a lightweight Self-Calibrated Efficient Transformer (SCET) network to solve this problem.
The architecture of SCET mainly consists of the self-calibrated module and efficient transformer block.
We provide comprehensive results on different settings of the overall network.
arXiv Detail & Related papers (2022-04-19T14:20:32Z) - MaxViT: Multi-Axis Vision Transformer [19.192826213493838]
We introduce an efficient and scalable attention model we call multi-axis attention.
We present a new architectural element by effectively blending our proposed attention model with convolutions.
We demonstrate the effectiveness of our model on a broad spectrum of vision tasks.
arXiv Detail & Related papers (2022-04-04T17:59:44Z) - MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions.
Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z) - Improved Transformer for High-Resolution GANs [69.42469272015481]
We introduce two key ingredients to Transformer to address this challenge.
We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 31.87 and 2.95 on unconditional ImageNet $128 times 128$ and FFHQ $256 times 256$, respectively.
arXiv Detail & Related papers (2021-06-14T17:39:49Z) - Asymmetric CNN for image super-resolution [102.96131810686231]
Deep convolutional neural networks (CNNs) have been widely applied for low-level vision over the past five years.
We propose an asymmetric CNN (ACNet) comprising an asymmetric block (AB), a mem?ory enhancement block (MEB) and a high-frequency feature enhancement block (HFFEB) for image super-resolution.
Our ACNet can effectively address single image super-resolution (SISR), blind SISR and blind SISR of blind noise problems.
arXiv Detail & Related papers (2021-03-25T07:10:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.