Related papers: HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

URL: http://arxiv.org/abs/2407.05878v1
Date: Mon, 8 Jul 2024 12:42:10 GMT
Title: HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Authors: Xiang Zhang, Yulun Zhang, Fisher Yu,
Abstract summary: We present a strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR) Specifically, we first replace the commonly used fixed small windows with expanding hierarchical windows to aggregate features at different scales. Considering the intensive computation required for large windows, we further design a spatial-channel correlation method with linear complexity to window sizes.
Score: 70.52256118833583
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have exhibited promising performance in computer vision tasks including image super-resolution (SR). However, popular transformer-based SR methods often employ window self-attention with quadratic computational complexity to window sizes, resulting in fixed small windows with limited receptive fields. In this paper, we present a general strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR), boosting SR performance with multi-scale features while maintaining an efficient design. Specifically, we first replace the commonly used fixed small windows with expanding hierarchical windows to aggregate features at different scales and establish long-range dependencies. Considering the intensive computation required for large windows, we further design a spatial-channel correlation method with linear complexity to window sizes, efficiently gathering spatial and channel information from hierarchical windows. Extensive experiments verify the effectiveness and efficiency of our HiT-SR, and our improved versions of SwinIR-Light, SwinIR-NG, and SRFormer-Light yield state-of-the-art SR results with fewer parameters, FLOPs, and faster speeds ($\sim7\times$).

Related papers

PiT: Progressive Diffusion Transformer [50.46345527963736]
We propose a series of Pseudo textbfProgressive Dtextbfiffusion textbfTransformer (textbfPiT)<n>Our proposed PiT-L achieves 54%$uparrow$ FID improvement over DiT-XL/2 while using less computation.
arXiv Detail & Related papers (2025-05-19T15:02:33Z)
FourierSR: A Fourier Token-based Plugin for Efficient Image Super-Resolution [21.909175743080713]
Image super-resolution (SR) aims to recover low-resolution images to high-resolution images, where improving SR efficiency is a high-profile challenge. Commonly used units in SR, like convolutions and window-based Transformers, have limited receptive fields. We propose a Fourier token-based plugin called FourierSR to improve SR uniformly.
arXiv Detail & Related papers (2025-03-13T04:50:55Z)
A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift [6.835244697120131]
We propose TaylorIR to address limitations by utilizing a patch size of 1x1, enabling pixel-level processing in any transformer-based SR model. Experimental results demonstrate that our approach achieves new state-of-the-art SR performance while reducing memory consumption by up to 60% compared to traditional self-attention-based transformers.
arXiv Detail & Related papers (2024-11-15T14:43:58Z)
Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR. Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z)
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures [96.00848293994463]
This paper introduces Vision-RWKV, a model adapted from the RWKV model used in the NLP field. Our model is designed to efficiently handle sparse inputs and demonstrate robust global processing capabilities. Our evaluations demonstrate that VRWKV surpasses ViT's performance in image classification and has significantly faster speeds and lower memory usage.
arXiv Detail & Related papers (2024-03-04T18:46:20Z)
Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR) CFSR inherits the advantages of both convolution-based and transformer-based approaches. Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z)
Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for Super-Resolution [16.54421804141835]
High resolution of intermediate features in SISR models increases memory and computational requirements. We propose a Deployment-friendly Inner-patch Transformer Network (DITN) for the SISR task. Our models can achieve competitive results in terms of qualitative and quantitative performance with high deployment efficiency.
arXiv Detail & Related papers (2023-08-05T05:42:51Z)
Incorporating Transformer Designs into Convolutions for Lightweight Image Super-Resolution [46.32359056424278]
Large convolutional kernels have become popular in designing convolutional neural networks. The increase in kernel size also leads to a quadratic growth in the number of parameters, resulting in heavy computation and memory requirements. We propose a neighborhood attention (NA) module that upgrades the standard convolution with a self-attention mechanism. Building upon the NA module, we propose a lightweight single image super-resolution (SISR) network named TCSR.
arXiv Detail & Related papers (2023-03-25T01:32:18Z)
Image Super-Resolution using Efficient Striped Window Transformer [6.815956004383743]
In this paper, we propose an efficient striped window transformer (ESWT) ESWT consists of efficient transformation layers (ETLs), allowing a clean structure and avoiding redundant operations. To further exploit the potential of the transformer, we propose a novel flexible window training strategy.
arXiv Detail & Related papers (2023-01-24T09:09:35Z)
Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks. We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers. Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z)
HRFormer: High-Resolution Transformer for Dense Prediction [99.6060997466614]
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks. We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet) We demonstrate the effectiveness of the High-Resolution Transformer on both human pose estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-10-18T15:37:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.