Multi-Scale Implicit Transformer with Re-parameterize for
Arbitrary-Scale Super-Resolution
- URL: http://arxiv.org/abs/2403.06536v1
- Date: Mon, 11 Mar 2024 09:23:20 GMT
- Title: Multi-Scale Implicit Transformer with Re-parameterize for
Arbitrary-Scale Super-Resolution
- Authors: Jinchen Zhu, Mingjian Zhang, Ling Zheng, Shizhuang Weng
- Abstract summary: Multi-Scale Implicit Transformer (MSIT)
MSIT consists of an Multi-scale Neural Operator (MSNO) and Multi-Scale Self-Attention (MSSA)
- Score: 2.4865475189445405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the methods based on implicit neural representations have shown
excellent capabilities for arbitrary-scale super-resolution (ASSR). Although
these methods represent the features of an image by generating latent codes,
these latent codes are difficult to adapt for different magnification factors
of super-resolution, which seriously affects their performance. Addressing
this, we design Multi-Scale Implicit Transformer (MSIT), consisting of an
Multi-scale Neural Operator (MSNO) and Multi-Scale Self-Attention (MSSA). Among
them, MSNO obtains multi-scale latent codes through feature enhancement,
multi-scale characteristics extraction, and multi-scale characteristics
merging. MSSA further enhances the multi-scale characteristics of latent codes,
resulting in better performance. Furthermore, to improve the performance of
network, we propose the Re-Interaction Module (RIM) combined with the
cumulative training strategy to improve the diversity of learned information
for the network. We have systematically introduced multi-scale characteristics
for the first time in ASSR, extensive experiments are performed to validate the
effectiveness of MSIT, and our method achieves state-of-the-art performance in
arbitrary super-resolution tasks.
Related papers
- MAT: Multi-Range Attention Transformer for Efficient Image Super-Resolution [14.265237560766268]
A flexible integration of attention across diverse spatial extents can yield significant performance enhancements.
We introduce Multi-Range Attention Transformer (MAT) tailored for Super Resolution (SR) tasks.
MAT adeptly capture dependencies across various spatial ranges, improving the diversity and efficacy of its feature representations.
arXiv Detail & Related papers (2024-11-26T08:30:31Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Multi-scale Unified Network for Image Classification [33.560003528712414]
CNNs face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs.
We propose Multi-scale Unified Network (MUSN) consisting of multi-scales, a unified network, and scale-invariant constraint.
MUSN yields an accuracy increase up to 44.53% and diminishes FLOPs by 7.01-16.13% in multi-scale scenarios.
arXiv Detail & Related papers (2024-03-27T06:40:26Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - Transformer-empowered Multi-scale Contextual Matching and Aggregation
for Multi-contrast MRI Super-resolution [55.52779466954026]
Multi-contrast super-resolution (SR) reconstruction is promising to yield SR images with higher quality.
Existing methods lack effective mechanisms to match and fuse these features for better reconstruction.
We propose a novel network to address these problems by developing a set of innovative Transformer-empowered multi-scale contextual matching and aggregation techniques.
arXiv Detail & Related papers (2022-03-26T01:42:59Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - Deep Iterative Residual Convolutional Network for Single Image
Super-Resolution [31.934084942626257]
We propose a deep Iterative Super-Resolution Residual Convolutional Network (ISRResCNet)
It exploits the powerful image regularization and large-scale optimization techniques by training the deep network in an iterative manner with a residual learning approach.
Our method with a few trainable parameters improves the results for different scaling factors in comparison with the state-of-art methods.
arXiv Detail & Related papers (2020-09-07T12:54:14Z) - Sequential Hierarchical Learning with Distribution Transformation for
Image Super-Resolution [83.70890515772456]
We build a sequential hierarchical learning super-resolution network (SHSR) for effective image SR.
We consider the inter-scale correlations of features, and devise a sequential multi-scale block (SMB) to progressively explore the hierarchical information.
Experiment results show SHSR achieves superior quantitative performance and visual quality to state-of-the-art methods.
arXiv Detail & Related papers (2020-07-19T01:35:53Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.