Related papers: A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift

A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift

URL: http://arxiv.org/abs/2411.10231v1
Date: Fri, 15 Nov 2024 14:43:58 GMT
Title: A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift
Authors: Sanath Budakegowdanadoddi Nagaraju, Brian Bernhard Moser, Tobias Christian Nauen, Stanislav Frolov, Federico Raue, Andreas Dengel,
Abstract summary: We propose TaylorIR to address limitations by utilizing a patch size of 1x1, enabling pixel-level processing in any transformer-based SR model. Experimental results demonstrate that our approach achieves new state-of-the-art SR performance while reducing memory consumption by up to 60% compared to traditional self-attention-based transformers.
Score: 6.835244697120131
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer-based Super-Resolution (SR) models have recently advanced image reconstruction quality, yet challenges remain due to computational complexity and an over-reliance on large patch sizes, which constrain fine-grained detail enhancement. In this work, we propose TaylorIR to address these limitations by utilizing a patch size of 1x1, enabling pixel-level processing in any transformer-based SR model. To address the significant computational demands under the traditional self-attention mechanism, we employ the TaylorShift attention mechanism, a memory-efficient alternative based on Taylor series expansion, achieving full token-to-token interactions with linear complexity. Experimental results demonstrate that our approach achieves new state-of-the-art SR performance while reducing memory consumption by up to 60% compared to traditional self-attention-based transformers.

Related papers

WaveHiT-SR: Hierarchical Wavelet Network for Efficient Image Super-Resolution [44.55918322585521]
We propose a new approach by embedding the wavelet transform within a hierarchical transformer framework, called (WaveHiT-SR)<n>By progressively reconstructing high-resolution images through hierarchical processing, the network reduces computational complexity without sacrificing performance.<n>Our refined versions of SwinIR-Light, SwinIR-NG, and SRFormer-Light deliver cutting-edge SR results, achieving higher efficiency with fewer parameters, lower FLOPs, and faster speeds.
arXiv Detail & Related papers (2025-08-27T14:37:50Z)
Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention [54.42902794496325]
Linear attention, a variant of softmax attention, demonstrates promise in global context modeling.<n>We propose Rank Enhanced Linear Attention (RELA), a simple yet effective method that enriches feature representations by integrating a lightweight depthwise convolution.<n>Building upon RELA, we propose an efficient and effective image restoration Transformer, named LAformer.
arXiv Detail & Related papers (2025-05-22T02:57:23Z)
FourierSR: A Fourier Token-based Plugin for Efficient Image Super-Resolution [21.909175743080713]
Image super-resolution (SR) aims to recover low-resolution images to high-resolution images, where improving SR efficiency is a high-profile challenge. Commonly used units in SR, like convolutions and window-based Transformers, have limited receptive fields. We propose a Fourier token-based plugin called FourierSR to improve SR uniformly.
arXiv Detail & Related papers (2025-03-13T04:50:55Z)
Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling [50.34513854725803]
Arbitrary-scale super-resolution (ASSR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs with arbitrary upsampling factors.<n>We propose a novel ContinuousSR framework with a Pixel-to-Gaussian paradigm, which explicitly reconstructs 2D continuous HR signals from LR images using Gaussian Splatting.
arXiv Detail & Related papers (2025-03-09T13:43:57Z)
Contrast: A Hybrid Architecture of Transformers and State Space Models for Low-Level Vision [3.574664325523221]
We propose textbfContrast, a hybrid SR model that combines textbfConvolutional, textbfTransformer, and textbfState Space components.<n>By integrating transformer and state space mechanisms, textbfContrast compensates for the shortcomings of each approach, enhancing both global context modeling and pixel-level accuracy.
arXiv Detail & Related papers (2025-01-23T03:34:14Z)
MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration [85.41380152286479]
Experimental results across diverse image restoration benchmarks demonstrate that MB-TaylorFormer V2 achieves state-of-the-art performance in multiple image restoration tasks. The proposed model, named the second version of Taylor formula expansion-based Transformer (for short MB-TaylorFormer V2), has the capability to concurrently process coarse-to-fine features.
arXiv Detail & Related papers (2025-01-08T13:13:52Z)
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution [70.52256118833583]
We present a strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR) Specifically, we first replace the commonly used fixed small windows with expanding hierarchical windows to aggregate features at different scales. Considering the intensive computation required for large windows, we further design a spatial-channel correlation method with linear complexity to window sizes.
arXiv Detail & Related papers (2024-07-08T12:42:10Z)
CFAT: Unleashing TriangularWindows for Image Super-resolution [5.130320840059732]
Transformer-based models have revolutionized the field of image super-resolution (SR) We propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion. Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.
arXiv Detail & Related papers (2024-03-24T13:31:31Z)
Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR) CFSR inherits the advantages of both convolution-based and transformer-based approaches. Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z)
PTSR: Patch Translator for Image Super-Resolution [16.243363392717434]
We propose a patch translator for image super-resolution (PTSR) to address this problem. The proposed PTSR is a transformer-based GAN network with no convolution operation. We introduce a novel patch translator module for regenerating the improved patches utilising multi-head attention.
arXiv Detail & Related papers (2023-10-20T01:45:00Z)
Reciprocal Attention Mixing Transformer for Lightweight Image Restoration [6.3159191692241095]
We propose a lightweight image restoration network, Reciprocal Attention Mixing Transformer (RAMiT) It employs bi-dimensional (spatial and channel) self-attentions in parallel with different numbers of multi-heads. It achieves state-of-the-art performance on multiple lightweight IR tasks, including super-resolution, color denoising, grayscale denoising, low-light enhancement, and deraining.
arXiv Detail & Related papers (2023-05-19T06:55:04Z)
Vicinity Vision Transformer [53.43198716947792]
We present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity. Our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous methods.
arXiv Detail & Related papers (2022-06-21T17:33:53Z)
Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks. We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers. Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z)
HRFormer: High-Resolution Transformer for Dense Prediction [99.6060997466614]
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks. We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet) We demonstrate the effectiveness of the High-Resolution Transformer on both human pose estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-10-18T15:37:58Z)
XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens. The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z)
Improved Transformer for High-Resolution GANs [69.42469272015481]
We introduce two key ingredients to Transformer to address this challenge. We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 31.87 and 2.95 on unconditional ImageNet $128 times 128$ and FFHQ $256 times 256$, respectively.
arXiv Detail & Related papers (2021-06-14T17:39:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.