Rethinking LayerNorm in Image Restoration Transformers
- URL: http://arxiv.org/abs/2504.06629v1
- Date: Wed, 09 Apr 2025 07:06:44 GMT
- Title: Rethinking LayerNorm in Image Restoration Transformers
- Authors: MinKyu Lee, Sangeek Hyun, Woojin Jun, Hyunjun Kim, Jiwoo Chung, Jae-Pil Heo,
- Abstract summary: This study investigates abnormal feature behaviors observed in image restoration (IR) Transformers.<n>We identify two critical issues: feature entropy becoming excessively small and feature magnitudes diverging up to a million-fold scale.<n>We propose a simple normalization strategy tailored for IR Transformers.
- Score: 20.67671141789497
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work investigates abnormal feature behaviors observed in image restoration (IR) Transformers. Specifically, we identify two critical issues: feature entropy becoming excessively small and feature magnitudes diverging up to a million-fold scale. We pinpoint the root cause to the per-token normalization aspect of conventional LayerNorm, which disrupts essential spatial correlations and internal feature statistics. To address this, we propose a simple normalization strategy tailored for IR Transformers. Our approach applies normalization across the entire spatio-channel dimension, effectively preserving spatial correlations. Additionally, we introduce an input-adaptive rescaling method that aligns feature statistics to the unique statistical requirements of each input. Experimental results verify that this combined strategy effectively resolves feature divergence, significantly enhancing both the stability and performance of IR Transformers across various IR tasks.
Related papers
- Transformer Meets Twicing: Harnessing Unattended Residual Information [2.1605931466490795]
Transformer-based deep learning models have achieved state-of-the-art performance across numerous language and vision tasks.<n>While the self-attention mechanism has proven capable of handling complex data patterns, it has been observed that the representational capacity of the attention matrix degrades significantly across transformer layers.<n>We propose the Twicing Attention, a novel attention mechanism that uses kernel twicing procedure in nonparametric regression to alleviate the low-pass behavior of associated NLM smoothing.
arXiv Detail & Related papers (2025-03-02T01:56:35Z) - DINT Transformer [5.990713912057883]
DIFF Transformer addresses the issue of irrelevant context interference by introducing a differential attention mechanism.<n>We propose DINT Transformer, which extends DIFF Transformer by incorporating a differential-integral mechanism.
arXiv Detail & Related papers (2025-01-29T08:53:29Z) - Sharing Key Semantics in Transformer Makes Efficient Image Restoration [148.22790334216117]
Self-attention mechanism, a cornerstone of Vision Transformers (ViTs) tends to encompass all global cues.<n>Small segments of a degraded image, particularly those closely aligned semantically, provide particularly relevant information to aid in the restoration process.<n>We propose boosting IR's performance by sharing the key semantics via Transformer for IR (ie, SemanIR) in this paper.
arXiv Detail & Related papers (2024-05-30T12:45:34Z) - Learning Exhaustive Correlation for Spectral Super-Resolution: Where Spatial-Spectral Attention Meets Linear Dependence [26.1694389791047]
Spectral super-resolution aims to recover hyperspectral image (HSI) from easily obtainable RGB image.
Two types of bottlenecks in existing Transformers limit performance improvement and practical applications.
We propose a novel Exhaustive Correlation Transformer (ECT) for spectral super-resolution.
arXiv Detail & Related papers (2023-12-20T08:30:07Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - Image Deblurring by Exploring In-depth Properties of Transformer [86.7039249037193]
We leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics.
By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information.
One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space.
arXiv Detail & Related papers (2023-03-24T14:14:25Z) - Exploring Invariant Representation for Visible-Infrared Person
Re-Identification [77.06940947765406]
Cross-spectral person re-identification, which aims to associate identities to pedestrians across different spectra, faces a main challenge of the modality discrepancy.
In this paper, we address the problem from both image-level and feature-level in an end-to-end hybrid learning framework named robust feature mining network (RFM)
Experiment results on two standard cross-spectral person re-identification datasets, RegDB and SYSU-MM01, have demonstrated state-of-the-art performance.
arXiv Detail & Related papers (2023-02-02T05:24:50Z) - Score-based Causal Representation Learning with Interventions [54.735484409244386]
This paper studies the causal representation learning problem when latent causal variables are observed indirectly.
The objectives are: (i) recovering the unknown linear transformation (up to scaling) and (ii) determining the directed acyclic graph (DAG) underlying the latent variables.
arXiv Detail & Related papers (2023-01-19T18:39:48Z) - Large-scale Global Low-rank Optimization for Computational Compressed
Imaging [8.594666859332124]
We present the global low-rank (GLR) optimization technique, realizing highly-efficient large-scale reconstruction with global self-similarity.
Inspired by the self-attention mechanism in deep learning, GLR extracts image patches by feature detection instead of conventional uniform selection.
We experimentally demonstrate GLR's effectiveness on temporal, frequency, and spectral dimensions.
arXiv Detail & Related papers (2023-01-08T14:12:51Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Learning High-Precision Bounding Box for Rotated Object Detection via
Kullback-Leibler Divergence [100.6913091147422]
Existing rotated object detectors are mostly inherited from the horizontal detection paradigm.
In this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology.
arXiv Detail & Related papers (2021-06-03T14:29:19Z) - Multivariate Functional Regression via Nested Reduced-Rank
Regularization [2.730097437607271]
We propose a nested reduced-rank regression (NRRR) approach in fitting regression model with multivariate functional responses and predictors.
We show through non-asymptotic analysis that NRRR can achieve at least a comparable error rate to that of the reduced-rank regression.
We apply NRRR in an electricity demand problem, to relate the trajectories of the daily electricity consumption with those of the daily temperatures.
arXiv Detail & Related papers (2020-03-10T14:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.