Attention-Guided Multi-scale Interaction Network for Face Super-Resolution
- URL: http://arxiv.org/abs/2409.00591v1
- Date: Sun, 1 Sep 2024 02:53:24 GMT
- Title: Attention-Guided Multi-scale Interaction Network for Face Super-Resolution
- Authors: Xujie Wan, Wenjie Li, Guangwei Gao, Huimin Lu, Jian Yang, Chia-Wen Lin,
- Abstract summary: CNN and Transformer hybrid networks demonstrated excellent performance in face super-resolution (FSR) tasks.
How to fuse multi-scale features and promote their complementarity is crucial for enhancing FSR.
Our design allows the free flow of multi-scale features from within modules and between encoder and decoder.
- Score: 46.42710591689621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, CNN and Transformer hybrid networks demonstrated excellent performance in face super-resolution (FSR) tasks. Since numerous features at different scales in hybrid networks, how to fuse these multi-scale features and promote their complementarity is crucial for enhancing FSR. However, existing hybrid network-based FSR methods ignore this, only simply combining the Transformer and CNN. To address this issue, we propose an attention-guided Multi-scale interaction network (AMINet), which contains local and global feature interactions as well as encoder-decoder phases feature interactions. Specifically, we propose a Local and Global Feature Interaction Module (LGFI) to promote fusions of global features and different receptive fields' local features extracted by our Residual Depth Feature Extraction Module (RDFE). Additionally, we propose a Selective Kernel Attention Fusion Module (SKAF) to adaptively select fusions of different features within LGFI and encoder-decoder phases. Our above design allows the free flow of multi-scale features from within modules and between encoder and decoder, which can promote the complementarity of different scale features to enhance FSR. Comprehensive experiments confirm that our method consistently performs well with less computational consumption and faster inference.
Related papers
- Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities.
The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z) - Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation [19.461033552684576]
We propose a Local-to-Global Cross-modal Attention-aware Fusion (LoGoCAF) framework for HSI-X classification.
LoGoCAF adopts a pixel-to-pixel two-branch semantic segmentation architecture to learn information from HSI and X modalities.
arXiv Detail & Related papers (2024-06-25T16:12:20Z) - Multi-Scale Implicit Transformer with Re-parameterize for
Arbitrary-Scale Super-Resolution [2.4865475189445405]
Multi-Scale Implicit Transformer (MSIT)
MSIT consists of an Multi-scale Neural Operator (MSNO) and Multi-Scale Self-Attention (MSSA)
arXiv Detail & Related papers (2024-03-11T09:23:20Z) - Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based
Transformer Network for Remote Sensing Image Super-Resolution [13.894645293832044]
Transformer-based models have shown competitive performance in remote sensing image super-resolution (RSISR)
We propose a novel transformer architecture called Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network (SPIFFNet) for RSISR.
Our proposed model effectively enhances global cognition and understanding of the entire image, facilitating efficient integration of features cross-stages.
arXiv Detail & Related papers (2023-07-06T13:19:06Z) - Efficient Image Super-Resolution with Feature Interaction Weighted Hybrid Network [101.53907377000445]
Lightweight image super-resolution aims to reconstruct high-resolution images from low-resolution images using low computational costs.
Existing methods result in the loss of middle-layer features due to activation functions.
We propose a Feature Interaction Weighted Hybrid Network (FIWHN) to minimize the impact of intermediate feature loss on reconstruction quality.
arXiv Detail & Related papers (2022-12-29T05:57:29Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Multi-modal land cover mapping of remote sensing images using pyramid
attention and gated fusion networks [20.66034058363032]
We propose a new multi-modality network for land cover mapping of multi-modal remote sensing data based on a novel pyramid attention fusion (PAF) module and a gated fusion unit (GFU)
PAF module is designed to efficiently obtain rich fine-grained contextual representations from each modality with a built-in cross-level and cross-view attention fusion mechanism.
GFU module utilizes a novel gating mechanism for early merging of features, thereby diminishing hidden redundancies and noise.
arXiv Detail & Related papers (2021-11-06T10:01:01Z) - Learning Deep Multimodal Feature Representation with Asymmetric
Multi-layer Fusion [63.72912507445662]
We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network.
We verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder.
Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively.
arXiv Detail & Related papers (2021-08-11T03:42:13Z) - Attentional Feature Fusion [4.265244011052538]
We propose a uniform and general scheme, namely attentional feature fusion.
We show that our models outperform state-of-the-art networks on both CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-09-29T15:10:18Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.