Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
- URL: http://arxiv.org/abs/2503.20174v1
- Date: Wed, 26 Mar 2025 02:58:41 GMT
- Title: Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
- Authors: Shihao Zhou, Dayu Li, Jinshan Pan, Juncheng Zhou, Jinglei Shi, Jufeng Yang,
- Abstract summary: Transformer-based approaches have gained significant attention in image restoration.<n>The core component, Multi-Head Attention, plays a crucial role in capturing diverse features and recovering high-quality results.<n>In this paper, we propose to improve MHA by exploring diverse learners and introducing various interactions between heads.
- Score: 44.39536029525856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based approaches have gained significant attention in image restoration, where the core component, i.e, Multi-Head Attention (MHA), plays a crucial role in capturing diverse features and recovering high-quality results. In MHA, heads perform attention calculation independently from uniform split subspaces, and a redundancy issue is triggered to hinder the model from achieving satisfactory outputs. In this paper, we propose to improve MHA by exploring diverse learners and introducing various interactions between heads, which results in a Hierarchical multI-head atteNtion driven Transformer model, termed HINT, for image restoration. HINT contains two modules, i.e., the Hierarchical Multi-Head Attention (HMHA) and the Query-Key Cache Updating (QKCU) module, to address the redundancy problem that is rooted in vanilla MHA. Specifically, HMHA extracts diverse contextual features by employing heads to learn from subspaces of varying sizes and containing different information. Moreover, QKCU, comprising intra- and inter-layer schemes, further reduces the redundancy problem by facilitating enhanced interactions between attention heads within and across layers. Extensive experiments are conducted on 12 benchmarks across 5 image restoration tasks, including low-light enhancement, dehazing, desnowing, denoising, and deraining, to demonstrate the superiority of HINT. The source code is available in the supplementary materials.
Related papers
- Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks.<n>MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization.<n>MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z) - Soft Knowledge Distillation with Multi-Dimensional Cross-Net Attention for Image Restoration Models Compression [0.0]
Transformer-based encoder-decoder models have achieved remarkable success in image-to-image transfer tasks.<n>However, their high computational complexity-manifested in elevated FLOPs and parameter counts-limits their application in real-world scenarios.<n>We propose a Soft Knowledge Distillation (SKD) strategy that incorporates a Multi-dimensional Cross-net Attention (MCA) mechanism for compressing image restoration models.
arXiv Detail & Related papers (2025-01-16T06:25:56Z) - Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration [58.11518043688793]
MPerceiver is a novel approach to enhance adaptiveness, generalizability and fidelity for all-in-one image restoration.
MPerceiver is trained on 9 tasks for all-in-one IR and outperforms state-of-the-art task-specific methods across most tasks.
arXiv Detail & Related papers (2023-12-05T17:47:11Z) - Multi-task Image Restoration Guided By Robust DINO Features [88.74005987908443]
We propose mboxtextbfDINO-IR, a multi-task image restoration approach leveraging robust features extracted from DINOv2.
We first propose a pixel-semantic fusion (PSF) module to dynamically fuse DINOV2's shallow features.
By formulating these modules into a unified deep model, we propose a DINO perception contrastive loss to constrain the model training.
arXiv Detail & Related papers (2023-12-04T06:59:55Z) - Finding the Pillars of Strength for Multi-Head Attention [35.556186723898485]
Recent studies have revealed some issues of Multi-Head Attention (MHA)
We propose Grouped Head Attention, trained with a self-supervised group constraint that group attention heads.
We additionally propose a Voting-to-Stay procedure to remove redundant heads, thus achieving a transformer with lighter weights.
arXiv Detail & Related papers (2023-05-22T03:44:44Z) - Masked Image Modeling with Local Multi-Scale Reconstruction [54.91442074100597]
Masked Image Modeling (MIM) achieves outstanding success in self-supervised representation learning.
Existing MIM models conduct reconstruction task only at the top layer of encoder.
We design local multi-scale reconstruction, where the lower and upper layers reconstruct fine-scale and coarse-scale supervision signals respectively.
arXiv Detail & Related papers (2023-03-09T13:42:04Z) - A Dynamic Head Importance Computation Mechanism for Neural Machine
Translation [22.784419165117512]
Multiple parallel attention mechanisms that use multiple attention heads facilitate greater performance of the Transformer model for various applications.
In this work, we focus on designing a Dynamic Head Importance Computation Mechanism (DHICM) to dynamically calculate the importance of a head with respect to the input.
We add an extra loss function to prevent the model from assigning same score to all heads, to identify more important heads and improvise performance.
arXiv Detail & Related papers (2021-08-03T09:16:55Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.