Related papers: MaxSR: Image Super-Resolution Using Improved MaxViT

MaxSR: Image Super-Resolution Using Improved MaxViT

URL: http://arxiv.org/abs/2307.07240v1
Date: Fri, 14 Jul 2023 09:26:47 GMT
Title: MaxSR: Image Super-Resolution Using Improved MaxViT
Authors: Bincheng Yang and Gangshan Wu
Abstract summary: We present a single image super-resolution model based on recent hybrid vision transformer of MaxViT, named as MaxSR. Our proposed model for classical single image super-resolution (MaxSR) and lightweight single image super-resolution (MaxSR-light) establish new state-of-the-art performance efficiently.
Score: 34.53995225219387
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While transformer models have been demonstrated to be effective for natural language processing tasks and high-level vision tasks, only a few attempts have been made to use powerful transformer models for single image super-resolution. Because transformer models have powerful representation capacity and the in-built self-attention mechanisms in transformer models help to leverage self-similarity prior in input low-resolution image to improve performance for single image super-resolution, we present a single image super-resolution model based on recent hybrid vision transformer of MaxViT, named as MaxSR. MaxSR consists of four parts, a shallow feature extraction block, multiple cascaded adaptive MaxViT blocks to extract deep hierarchical features and model global self-similarity from low-level features efficiently, a hierarchical feature fusion block, and finally a reconstruction block. The key component of MaxSR, i.e., adaptive MaxViT block, is based on MaxViT block which mixes MBConv with squeeze-and-excitation, block attention and grid attention. In order to achieve better global modelling of self-similarity in input low-resolution image, we improve block attention and grid attention in MaxViT block to adaptive block attention and adaptive grid attention which do self-attention inside each window across all grids and each grid across all windows respectively in the most efficient way. We instantiate proposed model for classical single image super-resolution (MaxSR) and lightweight single image super-resolution (MaxSR-light). Experiments show that our MaxSR and MaxSR-light establish new state-of-the-art performance efficiently.

Related papers

OminiControl: Minimal and Universal Control for Diffusion Transformer [68.3243031301164]
OminiControl is a framework that integrates image conditions into pre-trained Diffusion Transformer (DiT) models. At its core, OminiControl leverages a parameter reuse mechanism, enabling the DiT to encode image conditions using itself as a powerful backbone. OminiControl addresses a wide range of image conditioning tasks in a unified manner, including subject-driven generation and spatially-aligned conditions.
arXiv Detail & Related papers (2024-11-22T17:55:15Z)
A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift [6.835244697120131]
We propose TaylorIR to address limitations by utilizing a patch size of 1x1, enabling pixel-level processing in any transformer-based SR model. Experimental results demonstrate that our approach achieves new state-of-the-art SR performance while reducing memory consumption by up to 60% compared to traditional self-attention-based transformers.
arXiv Detail & Related papers (2024-11-15T14:43:58Z)
Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches. In this paper, we propose a model-based deep unfolded method for satellite image fusion. Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z)
CFAT: Unleashing TriangularWindows for Image Super-resolution [5.130320840059732]
Transformer-based models have revolutionized the field of image super-resolution (SR) We propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion. Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.
arXiv Detail & Related papers (2024-03-24T13:31:31Z)
Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block. Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z)
Self-Calibrated Efficient Transformer for Lightweight Super-Resolution [21.63691922827879]
We present a lightweight Self-Calibrated Efficient Transformer (SCET) network to solve this problem. The architecture of SCET mainly consists of the self-calibrated module and efficient transformer block. We provide comprehensive results on different settings of the overall network.
arXiv Detail & Related papers (2022-04-19T14:20:32Z)
MaxViT: Multi-Axis Vision Transformer [19.192826213493838]
We introduce an efficient and scalable attention model we call multi-axis attention. We present a new architectural element by effectively blending our proposed attention model with convolutions. We demonstrate the effectiveness of our model on a broad spectrum of vision tasks.
arXiv Detail & Related papers (2022-04-04T17:59:44Z)
MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions. Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z)
Image-specific Convolutional Kernel Modulation for Single Image Super-resolution [85.09413241502209]
In this issue, we propose a novel image-specific convolutional modulation kernel (IKM) We exploit the global contextual information of image or feature to generate an attention weight for adaptively modulating the convolutional kernels. Experiments on single image super-resolution show that the proposed methods achieve superior performances over state-of-the-art methods.
arXiv Detail & Related papers (2021-11-16T11:05:10Z)
Improved Transformer for High-Resolution GANs [69.42469272015481]
We introduce two key ingredients to Transformer to address this challenge. We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 31.87 and 2.95 on unconditional ImageNet $128 times 128$ and FFHQ $256 times 256$, respectively.
arXiv Detail & Related papers (2021-06-14T17:39:49Z)
Asymmetric CNN for image super-resolution [102.96131810686231]
Deep convolutional neural networks (CNNs) have been widely applied for low-level vision over the past five years. We propose an asymmetric CNN (ACNet) comprising an asymmetric block (AB), a mem?ory enhancement block (MEB) and a high-frequency feature enhancement block (HFFEB) for image super-resolution. Our ACNet can effectively address single image super-resolution (SISR), blind SISR and blind SISR of blind noise problems.
arXiv Detail & Related papers (2021-03-25T07:10:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.