Correlation Matching Transformation Transformers for UHD Image Restoration
- URL: http://arxiv.org/abs/2406.00629v1
- Date: Sun, 2 Jun 2024 06:10:48 GMT
- Title: Correlation Matching Transformation Transformers for UHD Image Restoration
- Authors: Cong Wang, Jinshan Pan, Wei Wang, Gang Fu, Siyuan Liang, Mengzhu Wang, Xiao-Ming Wu, Jun Liu,
- Abstract summary: This paper proposes a general Transformer for Ultra-High-Definition (UHD) image restoration.
UHDformer contains two learning spaces: (a) learning in high-resolution space and (b) learning in low-resolution space.
Experiments show that our UHDformer reduces about ninety-seven percent model sizes compared with most state-of-the-art methods.
- Score: 46.569124456928535
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper proposes UHDformer, a general Transformer for Ultra-High-Definition (UHD) image restoration. UHDformer contains two learning spaces: (a) learning in high-resolution space and (b) learning in low-resolution space. The former learns multi-level high-resolution features and fuses low-high features and reconstructs the residual images, while the latter explores more representative features learning from the high-resolution ones to facilitate better restoration. To better improve feature representation in low-resolution space, we propose to build feature transformation from the high-resolution space to the low-resolution one. To that end, we propose two new modules: Dual-path Correlation Matching Transformation module (DualCMT) and Adaptive Channel Modulator (ACM). The DualCMT selects top C/r (r is greater or equal to 1 which controls the squeezing level) correlation channels from the max-pooling/mean-pooling high-resolution features to replace low-resolution ones in Transformers, which can effectively squeeze useless content to improve the feature representation in low-resolution space to facilitate better recovery. The ACM is exploited to adaptively modulate multi-level high-resolution features, enabling to provide more useful features to low-resolution space for better learning. Experimental results show that our UHDformer reduces about ninety-seven percent model sizes compared with most state-of-the-art methods while significantly improving performance under different training sets on 3 UHD image restoration tasks, including low-light image enhancement, image dehazing, and image deblurring. The source codes will be made available at https://github.com/supersupercong/UHDformer.
Related papers
- MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution [31.564277546050484]
We propose to enhance the model adaptability to resolution variation by optimizing the patch embedding.
The proposed method, called Multi-Scale Patch Embedding (MSPE), substitutes the standard patch embedding with multiple variable-sized patch kernels.
Our method does not require high-cost training or modifications to other parts, making it easy to apply to most ViT models.
arXiv Detail & Related papers (2024-05-28T14:50:12Z) - Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution [151.1255837803585]
We propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo) for video super-resolution.
SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction.
Experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-25T17:59:26Z) - Dual Degradation-Inspired Deep Unfolding Network for Low-Light Image
Enhancement [3.4929041108486185]
We propose a Dual degrAdation-inSpired deep Unfolding network, termed DASUNet, for low-light image enhancement.
It learns two distinct image priors via considering degradation specificity between luminance and chrominance spaces.
Our source code and pretrained model will be publicly available.
arXiv Detail & Related papers (2023-08-05T03:07:11Z) - Lightweight Structure-aware Transformer Network for VHR Remote Sensing
Image Change Detection [15.391216316828354]
This Letter proposes a Lightweight Structure-aware Transformer (LSAT) network for RS image CD.
First, a Cross-dimension Interactive Self-attention (CISA) module with linear complexity is designed to replace the vanilla self-attention in visual Transformer.
Second, a Structure-aware Enhancement Module (SAEM) is designed to enhance difference features and edge detail information.
arXiv Detail & Related papers (2023-06-03T03:21:18Z) - Raising The Limit Of Image Rescaling Using Auxiliary Encoding [7.9700865143145485]
Recently, image rescaling models like IRN utilize the bidirectional nature of INN to push the performance limit of image upscaling.
We propose auxiliary encoding modules to further push the limit of image rescaling performance.
arXiv Detail & Related papers (2023-03-12T20:49:07Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - Reference-based Image and Video Super-Resolution via C2-Matching [100.0808130445653]
We propose C2-Matching, which performs explicit robust matching crossing transformation and resolution.
C2-Matching significantly outperforms state of the arts on the standard CUFED5 benchmark.
We also extend C2-Matching to Reference-based Video Super-Resolution task, where an image taken in a similar scene serves as the HR reference image.
arXiv Detail & Related papers (2022-12-19T16:15:02Z) - Large Motion Video Super-Resolution with Dual Subnet and Multi-Stage
Communicated Upsampling [18.09730129484432]
Video super-resolution (VSR) aims at restoring a video in low-resolution (LR) and improving it to higher-resolution (HR)
In this paper, we propose a novel deep neural network with Dual Subnet and Multi-stage Communicated Upsampling (DSMC) for super-resolution of videos with large motion.
arXiv Detail & Related papers (2021-03-22T11:52:12Z) - Hierarchical Amortized Training for Memory-efficient High Resolution 3D
GAN [52.851990439671475]
We propose a novel end-to-end GAN architecture that can generate high-resolution 3D images.
We achieve this goal by using different configurations between training and inference.
Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation.
arXiv Detail & Related papers (2020-08-05T02:33:04Z) - Gated Fusion Network for Degraded Image Super Resolution [78.67168802945069]
We propose a dual-branch convolutional neural network to extract base features and recovered features separately.
By decomposing the feature extraction step into two task-independent streams, the dual-branch model can facilitate the training process.
arXiv Detail & Related papers (2020-03-02T13:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.