Related papers: Enhancing Image Restoration Transformer via Adaptive Translation Equivariance

Enhancing Image Restoration Transformer via Adaptive Translation Equivariance

URL: http://arxiv.org/abs/2506.18520v1
Date: Mon, 23 Jun 2025 11:23:04 GMT
Title: Enhancing Image Restoration Transformer via Adaptive Translation Equivariance
Authors: JiaKui Hu, Zhengjian Yao, Lujia Jin, Hangzhou He, Yanye Lu,
Abstract summary: We develop an adaptive sliding indexing mechanism to efficiently select key-value pairs for each query, which are then generalizationd in parallel with globally aggregated key-value pairs.<n>The results highlight its superiority in terms of effectiveness, training convergence, and generalization.
Score: 4.302970926810013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Translation equivariance is a fundamental inductive bias in image restoration, ensuring that translated inputs produce translated outputs. Attention mechanisms in modern restoration transformers undermine this property, adversely impacting both training convergence and generalization. To alleviate this issue, we propose two key strategies for incorporating translation equivariance: slide indexing and component stacking. Slide indexing maintains operator responses at fixed positions, with sliding window attention being a notable example, while component stacking enables the arrangement of translation-equivariant operators in parallel or sequentially, thereby building complex architectures while preserving translation equivariance. However, these strategies still create a dilemma in model design between the high computational cost of self-attention and the fixed receptive field associated with sliding window attention. To address this, we develop an adaptive sliding indexing mechanism to efficiently select key-value pairs for each query, which are then concatenated in parallel with globally aggregated key-value pairs. The designed network, called the Translation Equivariance Adaptive Transformer (TEAFormer), is assessed across a variety of image restoration tasks. The results highlight its superiority in terms of effectiveness, training convergence, and generalization.

Related papers

A Constrained Optimization Perspective of Unrolled Transformers [77.12297732942095]
We introduce a constrained optimization framework for training transformers that behave like optimization descent algorithms.<n>We observe constrained transformers achieve stronger to perturbations robustness and maintain higher out-of-distribution generalization.
arXiv Detail & Related papers (2026-01-24T02:12:39Z)
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models [1.474723404975345]
We propose seq-JEPA, a world modeling framework that introduces architectural biases into joint-embedding predictive architectures.<n>Seq-JEPA simultaneously learns two architecturally segregated representations: one equivariant to specified transformations and another invariant to them.<n>It excels at tasks that inherently require aggregating a sequence of observations, such as path integration across actions and predictive learning across eye movements.
arXiv Detail & Related papers (2025-05-06T04:39:11Z)
Transformer Meets Twicing: Harnessing Unattended Residual Information [2.1605931466490795]
Transformer-based deep learning models have achieved state-of-the-art performance across numerous language and vision tasks.<n>While the self-attention mechanism has proven capable of handling complex data patterns, it has been observed that the representational capacity of the attention matrix degrades significantly across transformer layers.<n>We propose the Twicing Attention, a novel attention mechanism that uses kernel twicing procedure in nonparametric regression to alleviate the low-pass behavior of associated NLM smoothing.
arXiv Detail & Related papers (2025-03-02T01:56:35Z)
Sharing Key Semantics in Transformer Makes Efficient Image Restoration [148.22790334216117]
Self-attention mechanism, a cornerstone of Vision Transformers (ViTs) tends to encompass all global cues.<n>Small segments of a degraded image, particularly those closely aligned semantically, provide particularly relevant information to aid in the restoration process.<n>We propose boosting IR's performance by sharing the key semantics via Transformer for IR (ie, SemanIR) in this paper.
arXiv Detail & Related papers (2024-05-30T12:45:34Z)
Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection [37.142470149311904]
We propose atemporal equivariant learning framework by considering both spatial and temporal augmentations jointly. We show our pre-training method for 3D object detection which outperforms existing equivariant and invariant approaches in many settings.
arXiv Detail & Related papers (2024-04-17T20:41:49Z)
EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention [88.45459681677369]
We propose a novel transformer variant with complex vector attention, named EulerFormer. It provides a unified theoretical framework to formulate both semantic difference and positional difference. It is more robust to semantic variations and possesses moresuperior theoretical properties in principle.
arXiv Detail & Related papers (2024-03-26T14:18:43Z)
Self-Supervised Learning for Group Equivariant Neural Networks [75.62232699377877]
Group equivariant neural networks are the models whose structure is restricted to commute with the transformations on the input. We propose two concepts for self-supervised tasks: equivariant pretext labels and invariant contrastive loss. Experiments on standard image recognition benchmarks demonstrate that the equivariant neural networks exploit the proposed self-supervised tasks.
arXiv Detail & Related papers (2023-03-08T08:11:26Z)
Deep Neural Networks with Efficient Guaranteed Invariances [77.99182201815763]
We address the problem of improving the performance and in particular the sample complexity of deep neural networks. Group-equivariant convolutions are a popular approach to obtain equivariant representations. We propose a multi-stream architecture, where each stream is invariant to a different transformation.
arXiv Detail & Related papers (2023-03-02T20:44:45Z)
Empowering Networks With Scale and Rotation Equivariance Using A Similarity Convolution [16.853711292804476]
We devise a method that endows CNNs with simultaneous equivariance with respect to translation, rotation, and scaling. Our approach defines a convolution-like operation and ensures equivariance based on our proposed scalable Fourier-Argand representation. We validate the efficacy of our approach in the image classification task, demonstrating its robustness and the generalization ability to both scaled and rotated inputs.
arXiv Detail & Related papers (2023-03-01T08:43:05Z)
Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in Transformer-Based Variational AutoEncoder for Diverse Text Generation [85.5379146125199]
Variational Auto-Encoder (VAE) has been widely adopted in text generation. We propose TRACE, a Transformer-based recurrent VAE structure.
arXiv Detail & Related papers (2022-10-22T10:25:35Z)
Improving the Sample-Complexity of Deep Classification Networks with Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks. We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems. We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
Translational Equivariance in Kernelizable Attention [3.236198583140341]
We show how translational equivariance can be implemented in efficient Transformers based on kernelizable attention. Our experiments highlight that the devised approach significantly improves robustness of Performers to shifts of input images.
arXiv Detail & Related papers (2021-02-15T17:14:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.