LKFormer: Large Kernel Transformer for Infrared Image Super-Resolution
- URL: http://arxiv.org/abs/2401.11859v2
- Date: Wed, 24 Jan 2024 11:24:40 GMT
- Title: LKFormer: Large Kernel Transformer for Infrared Image Super-Resolution
- Authors: Feiwei Qin and Kang Yan and Changmiao Wang and Ruiquan Ge and Yong
Peng and Kai Zhang
- Abstract summary: We propose a potent Transformer model, termed Large Kernel Transformer (LKFormer) to capture infrared images.
This mainly employs depth-wise convolution with large kernels to execute non-local feature modeling.
We have devised a novel feed-forward network structure called Gated-Pixel Feed-Forward Network (GPFN) to augment the LKFormer's capacity to manage the information flow within the network.
- Score: 5.478440050117844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given the broad application of infrared technology across diverse fields,
there is an increasing emphasis on investigating super-resolution techniques
for infrared images within the realm of deep learning. Despite the impressive
results of current Transformer-based methods in image super-resolution tasks,
their reliance on the self-attentive mechanism intrinsic to the Transformer
architecture results in images being treated as one-dimensional sequences,
thereby neglecting their inherent two-dimensional structure. Moreover, infrared
images exhibit a uniform pixel distribution and a limited gradient range,
posing challenges for the model to capture effective feature information.
Consequently, we suggest a potent Transformer model, termed Large Kernel
Transformer (LKFormer), to address this issue. Specifically, we have designed a
Large Kernel Residual Attention (LKRA) module with linear complexity. This
mainly employs depth-wise convolution with large kernels to execute non-local
feature modeling, thereby substituting the standard self-attentive layer.
Additionally, we have devised a novel feed-forward network structure called
Gated-Pixel Feed-Forward Network (GPFN) to augment the LKFormer's capacity to
manage the information flow within the network. Comprehensive experimental
results reveal that our method surpasses the most advanced techniques
available, using fewer parameters and yielding considerably superior
performance.The source code will be available at
https://github.com/sad192/large-kernel-Transformer.
Related papers
- Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR)
In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks.
We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z) - DSR-Diff: Depth Map Super-Resolution with Diffusion Model [38.68563026759223]
We present a novel CDSR paradigm that utilizes a diffusion model within the latent space to generate guidance for depth map super-resolution.
Our proposed method has shown superior performance in extensive experiments when compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-11-16T14:18:10Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Self-Calibrated Efficient Transformer for Lightweight Super-Resolution [21.63691922827879]
We present a lightweight Self-Calibrated Efficient Transformer (SCET) network to solve this problem.
The architecture of SCET mainly consists of the self-calibrated module and efficient transformer block.
We provide comprehensive results on different settings of the overall network.
arXiv Detail & Related papers (2022-04-19T14:20:32Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image
Super-resolution [9.022005574190182]
We design a network based on the transformer for fusing the low-resolution hyperspectral images and high-resolution multispectral images.
Considering the LR-HSIs hold the main spectral structure, the network focuses on the spatial detail estimation.
Various experiments and quality indexes show our approach's superiority compared with other state-of-the-art methods.
arXiv Detail & Related papers (2021-09-05T14:00:34Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.