Efficient Mixed Transformer for Single Image Super-Resolution
- URL: http://arxiv.org/abs/2305.11403v5
- Date: Mon, 19 Jun 2023 06:56:23 GMT
- Title: Efficient Mixed Transformer for Single Image Super-Resolution
- Authors: Ling Zheng, Jinchen Zhu, Jinpeng Shi, Shizhuang Weng
- Abstract summary: Mixed Transformer Block (MTB) consists of multiple consecutive transformer layers.
Pixel Mixer (PM) is used to replace the Self-Attention (SA)
PM can enhance the local knowledge aggregation with pixel shifting operations.
- Score: 1.7740376367999706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Transformer-based methods have achieved impressive results in
single image super-resolution (SISR). However, the lack of locality mechanism
and high complexity limit their application in the field of super-resolution
(SR). To solve these problems, we propose a new method, Efficient Mixed
Transformer (EMT) in this study. Specifically, we propose the Mixed Transformer
Block (MTB), consisting of multiple consecutive transformer layers, in some of
which the Pixel Mixer (PM) is used to replace the Self-Attention (SA). PM can
enhance the local knowledge aggregation with pixel shifting operations. At the
same time, no additional complexity is introduced as PM has no parameters and
floating-point operations. Moreover, we employ striped window for SA (SWSA) to
gain an efficient global dependency modelling by utilizing image anisotropy.
Experimental results show that EMT outperforms the existing methods on
benchmark dataset and achieved state-of-the-art performance. The Code is
available at https://github.com/Fried-Rice-Lab/FriedRiceLab.
Related papers
- MoEUT: Mixture-of-Experts Universal Transformers [75.96744719516813]
Universal Transformers (UTs) have advantages over standard Transformers in learning compositional generalizations.
Layer-sharing drastically reduces the parameter count compared to the non-shared model with the same dimensionality.
No previous work has succeeded in proposing a shared-layer Transformer design that is competitive in parameter count-dominated tasks such as language modeling.
arXiv Detail & Related papers (2024-05-25T03:24:32Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer [8.962657021133925]
Cross-scale transformer (CT) processes feature representations at different stages without additional computation.
We introduce an adaptive matching-aware transformer (AMT) that employs different interactive attention combinations at multiple scales.
We also present a dual-feature guided aggregation (DFGA) that embeds the coarse global semantic information into the finer cost volume construction.
arXiv Detail & Related papers (2023-12-14T01:33:18Z) - Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for
Super-Resolution [16.54421804141835]
High resolution of intermediate features in SISR models increases memory and computational requirements.
We propose a Deployment-friendly Inner-patch Transformer Network (DITN) for the SISR task.
Our models can achieve competitive results in terms of qualitative and quantitative performance with high deployment efficiency.
arXiv Detail & Related papers (2023-08-05T05:42:51Z) - Reciprocal Attention Mixing Transformer for Lightweight Image Restoration [6.3159191692241095]
We propose a lightweight image restoration network, Reciprocal Attention Mixing Transformer (RAMiT)
It employs bi-dimensional (spatial and channel) self-attentions in parallel with different numbers of multi-heads.
It achieves state-of-the-art performance on multiple lightweight IR tasks, including super-resolution, color denoising, grayscale denoising, low-light enhancement, and deraining.
arXiv Detail & Related papers (2023-05-19T06:55:04Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions.
Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z) - Mixed Transformer U-Net For Medical Image Segmentation [14.046456257175237]
We propose a novel Mixed Transformer Module (MTM) for simultaneous inter- and intra- affinities learning.
By using MTM, we construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation.
arXiv Detail & Related papers (2021-11-08T09:03:46Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.