Related papers: CFAT: Unleashing TriangularWindows for Image Super-resolution

CFAT: Unleashing TriangularWindows for Image Super-resolution

URL: http://arxiv.org/abs/2403.16143v1
Date: Sun, 24 Mar 2024 13:31:31 GMT
Title: CFAT: Unleashing TriangularWindows for Image Super-resolution
Authors: Abhisek Ray, Gaurav Kumar, Maheshkumar H. Kolekar,
Abstract summary: Transformer-based models have revolutionized the field of image super-resolution (SR) We propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion. Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.
Score: 5.130320840059732
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer-based models have revolutionized the field of image super-resolution (SR) by harnessing their inherent ability to capture complex contextual features. The overlapping rectangular shifted window technique used in transformer architecture nowadays is a common practice in super-resolution models to improve the quality and robustness of image upscaling. However, it suffers from distortion at the boundaries and has limited unique shifting modes. To overcome these weaknesses, we propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion and allows the model to access more unique sifting modes. In this paper, we propose a Composite Fusion Attention Transformer (CFAT) that incorporates triangular-rectangular window-based local attention with a channel-based global attention technique in image super-resolution. As a result, CFAT enables attention mechanisms to be activated on more image pixels and captures long-range, multi-scale features to improve SR performance. The extensive experimental results and ablation study demonstrate the effectiveness of CFAT in the SR domain. Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.

Related papers

Rotation Equivariant Arbitrary-scale Image Super-Resolution [62.41329042683779]
The arbitrary-scale image super-resolution (ASISR) aims to achieve arbitrary-scale high-resolution recoveries from a low-resolution input image.<n>We make efforts to construct a rotation equivariant ASISR method in this study.
arXiv Detail & Related papers (2025-08-07T08:51:03Z)
DSwinIR: Rethinking Window-based Attention for Image Restoration [109.38288333994407]
We propose the Deformable Sliding Window Transformer (DSwinIR) as a new foundational backbone architecture for image restoration.<n>At the heart of DSwinIR is the proposed novel Deformable Sliding Window (DSwin) Attention.<n>Extensive experiments show that DSwinIR sets a new state-of-the-art across a wide spectrum of image restoration tasks.
arXiv Detail & Related papers (2025-04-07T09:24:41Z)
Feature Alignment with Equivariant Convolutions for Burst Image Super-Resolution [52.55429225242423]
We propose a novel framework for Burst Image Super-Resolution (BISR), featuring an equivariant convolution-based alignment. This enables the alignment transformation to be learned via explicit supervision in the image domain and easily applied in the feature domain. Experiments on BISR benchmarks show the superior performance of our approach in both quantitative metrics and visual quality.
arXiv Detail & Related papers (2025-03-11T11:13:10Z)
TranStable: Towards Robust Pixel-level Online Video Stabilization by Jointing Transformer and CNN [3.0980248517369158]
Video stabilization often struggles with distortion and excessive cropping. This paper proposes a novel end-to-end framework, named TranStable, to address these challenges. Experiments on NUS, DeepStab, and Selfie benchmarks demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2025-01-25T08:51:31Z)
OminiControl: Minimal and Universal Control for Diffusion Transformer [68.3243031301164]
We present OminiControl, a novel approach that rethinks how image conditions are integrated into Diffusion Transformer (DiT) architectures. OminiControl addresses these limitations through three key innovations.
arXiv Detail & Related papers (2024-11-22T17:55:15Z)
Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches. In this paper, we propose a model-based deep unfolded method for satellite image fusion. Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z)
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
MaxSR: Image Super-Resolution Using Improved MaxViT [34.53995225219387]
We present a single image super-resolution model based on recent hybrid vision transformer of MaxViT, named as MaxSR. Our proposed model for classical single image super-resolution (MaxSR) and lightweight single image super-resolution (MaxSR-light) establish new state-of-the-art performance efficiently.
arXiv Detail & Related papers (2023-07-14T09:26:47Z)
Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution [13.894645293832044]
Transformer-based models have shown competitive performance in remote sensing image super-resolution (RSISR) We propose a novel transformer architecture called Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network (SPIFFNet) for RSISR. Our proposed model effectively enhances global cognition and understanding of the entire image, facilitating efficient integration of features cross-stages.
arXiv Detail & Related papers (2023-07-06T13:19:06Z)
RBSR: Efficient and Flexible Recurrent Network for Burst Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images. In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
Asymmetric CNN for image super-resolution [102.96131810686231]
Deep convolutional neural networks (CNNs) have been widely applied for low-level vision over the past five years. We propose an asymmetric CNN (ACNet) comprising an asymmetric block (AB), a mem?ory enhancement block (MEB) and a high-frequency feature enhancement block (HFFEB) for image super-resolution. Our ACNet can effectively address single image super-resolution (SISR), blind SISR and blind SISR of blind noise problems.
arXiv Detail & Related papers (2021-03-25T07:10:46Z)
Accurate and Lightweight Image Super-Resolution with Model-Guided Deep Unfolding Network [63.69237156340457]
We present and advocate an explainable approach toward SISR named model-guided deep unfolding network (MoG-DUN) MoG-DUN is accurate (producing fewer aliasing artifacts), computationally efficient (with reduced model parameters), and versatile (capable of handling multiple degradations) The superiority of the proposed MoG-DUN method to existing state-of-theart image methods including RCAN, SRDNF, and SRFBN is substantiated by extensive experiments on several popular datasets and various degradation scenarios.
arXiv Detail & Related papers (2020-09-14T08:23:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.