Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for
Super-Resolution
- URL: http://arxiv.org/abs/2308.02794v1
- Date: Sat, 5 Aug 2023 05:42:51 GMT
- Title: Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for
Super-Resolution
- Authors: Yong Liu, Hang Dong, Boyang Liang, Songwei Liu, Qingji Dong, Kai Chen,
Fangmin Chen, Lean Fu, and Fei Wang
- Abstract summary: High resolution of intermediate features in SISR models increases memory and computational requirements.
We propose a Deployment-friendly Inner-patch Transformer Network (DITN) for the SISR task.
Our models can achieve competitive results in terms of qualitative and quantitative performance with high deployment efficiency.
- Score: 16.54421804141835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed a few attempts of vision transformers for single
image super-resolution (SISR). Since the high resolution of intermediate
features in SISR models increases memory and computational requirements,
efficient SISR transformers are more favored. Based on some popular transformer
backbone, many methods have explored reasonable schemes to reduce the
computational complexity of the self-attention module while achieving
impressive performance. However, these methods only focus on the performance on
the training platform (e.g., Pytorch/Tensorflow) without further optimization
for the deployment platform (e.g., TensorRT). Therefore, they inevitably
contain some redundant operators, posing challenges for subsequent deployment
in real-world applications. In this paper, we propose a deployment-friendly
transformer unit, namely UFONE (i.e., UnFolding ONce is Enough), to alleviate
these problems. In each UFONE, we introduce an Inner-patch Transformer Layer
(ITL) to efficiently reconstruct the local structural information from patches
and a Spatial-Aware Layer (SAL) to exploit the long-range dependencies between
patches. Based on UFONE, we propose a Deployment-friendly Inner-patch
Transformer Network (DITN) for the SISR task, which can achieve favorable
performance with low latency and memory usage on both training and deployment
platforms. Furthermore, to further boost the deployment efficiency of the
proposed DITN on TensorRT, we also provide an efficient substitution for layer
normalization and propose a fusion optimization strategy for specific
operators. Extensive experiments show that our models can achieve competitive
results in terms of qualitative and quantitative performance with high
deployment efficiency. Code is available at
\url{https://github.com/yongliuy/DITN}.
Related papers
- HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution [70.52256118833583]
We present a strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR)
Specifically, we first replace the commonly used fixed small windows with expanding hierarchical windows to aggregate features at different scales.
Considering the intensive computation required for large windows, we further design a spatial-channel correlation method with linear complexity to window sizes.
arXiv Detail & Related papers (2024-07-08T12:42:10Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Incorporating Transformer Designs into Convolutions for Lightweight
Image Super-Resolution [46.32359056424278]
Large convolutional kernels have become popular in designing convolutional neural networks.
The increase in kernel size also leads to a quadratic growth in the number of parameters, resulting in heavy computation and memory requirements.
We propose a neighborhood attention (NA) module that upgrades the standard convolution with a self-attention mechanism.
Building upon the NA module, we propose a lightweight single image super-resolution (SISR) network named TCSR.
arXiv Detail & Related papers (2023-03-25T01:32:18Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Image Super-Resolution using Efficient Striped Window Transformer [6.815956004383743]
In this paper, we propose an efficient striped window transformer (ESWT)
ESWT consists of efficient transformation layers (ETLs), allowing a clean structure and avoiding redundant operations.
To further exploit the potential of the transformer, we propose a novel flexible window training strategy.
arXiv Detail & Related papers (2023-01-24T09:09:35Z) - DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks
for Image Super-Resolution [83.47467223117361]
We propose an effective lightweight dynamic local and global self-attention network (DLGSANet) to solve image super-resolution.
Motivated by the network designs of Transformers, we develop a simple yet effective multi-head dynamic local self-attention (MHDLSA) module to extract local features efficiently.
To overcome this problem, we develop a sparse global self-attention (SparseGSA) module to select the most useful similarity values.
arXiv Detail & Related papers (2023-01-05T12:06:47Z) - Residual Local Feature Network for Efficient Super-Resolution [20.62809970985125]
In this work, we propose a novel Residual Local Feature Network (RLFN)
The main idea is using three convolutional layers for residual local feature learning to simplify feature aggregation.
In addition, we won the first place in the runtime track of the NTIRE 2022 efficient super-resolution challenge.
arXiv Detail & Related papers (2022-05-16T08:46:34Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.