CFAT: Unleashing TriangularWindows for Image Super-resolution
        - URL: http://arxiv.org/abs/2403.16143v1
 - Date: Sun, 24 Mar 2024 13:31:31 GMT
 - Title: CFAT: Unleashing TriangularWindows for Image Super-resolution
 - Authors: Abhisek Ray, Gaurav Kumar, Maheshkumar H. Kolekar, 
 - Abstract summary: Transformer-based models have revolutionized the field of image super-resolution (SR)
We propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion.
Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.
 - Score: 5.130320840059732
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   Transformer-based models have revolutionized the field of image super-resolution (SR) by harnessing their inherent ability to capture complex contextual features. The overlapping rectangular shifted window technique used in transformer architecture nowadays is a common practice in super-resolution models to improve the quality and robustness of image upscaling. However, it suffers from distortion at the boundaries and has limited unique shifting modes. To overcome these weaknesses, we propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion and allows the model to access more unique sifting modes. In this paper, we propose a Composite Fusion Attention Transformer (CFAT) that incorporates triangular-rectangular window-based local attention with a channel-based global attention technique in image super-resolution. As a result, CFAT enables attention mechanisms to be activated on more image pixels and captures long-range, multi-scale features to improve SR performance. The extensive experimental results and ablation study demonstrate the effectiveness of CFAT in the SR domain. Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures. 
 
       
      
        Related papers
        - Rotation Equivariant Arbitrary-scale Image Super-Resolution [62.41329042683779]
The arbitrary-scale image super-resolution (ASISR) aims to achieve arbitrary-scale high-resolution recoveries from a low-resolution input image.<n>We make efforts to construct a rotation equivariant ASISR method in this study.
arXiv  Detail & Related papers  (2025-08-07T08:51:03Z) - DSwinIR: Rethinking Window-based Attention for Image Restoration [109.38288333994407]
We propose the Deformable Sliding Window Transformer (DSwinIR) as a new foundational backbone architecture for image restoration.<n>At the heart of DSwinIR is the proposed novel Deformable Sliding Window (DSwin) Attention.<n>Extensive experiments show that DSwinIR sets a new state-of-the-art across a wide spectrum of image restoration tasks.
arXiv  Detail & Related papers  (2025-04-07T09:24:41Z) - Feature Alignment with Equivariant Convolutions for Burst Image   Super-Resolution [52.55429225242423]
We propose a novel framework for Burst Image Super-Resolution (BISR), featuring an equivariant convolution-based alignment.
This enables the alignment transformation to be learned via explicit supervision in the image domain and easily applied in the feature domain.
Experiments on BISR benchmarks show the superior performance of our approach in both quantitative metrics and visual quality.
arXiv  Detail & Related papers  (2025-03-11T11:13:10Z) - TranStable: Towards Robust Pixel-level Online Video Stabilization by   Jointing Transformer and CNN [3.0980248517369158]
Video stabilization often struggles with distortion and excessive cropping.
This paper proposes a novel end-to-end framework, named TranStable, to address these challenges.
Experiments on NUS, DeepStab, and Selfie benchmarks demonstrate state-of-the-art performance.
arXiv  Detail & Related papers  (2025-01-25T08:51:31Z) - OminiControl: Minimal and Universal Control for Diffusion Transformer [68.3243031301164]
We present OminiControl, a novel approach that rethinks how image conditions are integrated into Diffusion Transformer (DiT) architectures.
OminiControl addresses these limitations through three key innovations.
arXiv  Detail & Related papers  (2024-11-22T17:55:15Z) - Multi-Head Attention Residual Unfolded Network for Model-Based   Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
 Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv  Detail & Related papers  (2024-09-04T13:05:00Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
  Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv  Detail & Related papers  (2023-12-24T08:42:37Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv  Detail & Related papers  (2023-10-11T12:46:11Z) - MaxSR: Image Super-Resolution Using Improved MaxViT [34.53995225219387]
We present a single image super-resolution model based on recent hybrid vision transformer of MaxViT, named as MaxSR.
Our proposed model for classical single image super-resolution (MaxSR) and lightweight single image super-resolution (MaxSR-light) establish new state-of-the-art performance efficiently.
arXiv  Detail & Related papers  (2023-07-14T09:26:47Z) - Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based
  Transformer Network for Remote Sensing Image Super-Resolution [13.894645293832044]
Transformer-based models have shown competitive performance in remote sensing image super-resolution (RSISR)
We propose a novel transformer architecture called Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network (SPIFFNet) for RSISR.
Our proposed model effectively enhances global cognition and understanding of the entire image, facilitating efficient integration of features cross-stages.
arXiv  Detail & Related papers  (2023-07-06T13:19:06Z) - RBSR: Efficient and Flexible Recurrent Network for Burst
  Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images.
In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv  Detail & Related papers  (2023-06-30T12:14:13Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv  Detail & Related papers  (2021-12-31T04:37:11Z) - Asymmetric CNN for image super-resolution [102.96131810686231]
Deep convolutional neural networks (CNNs) have been widely applied for low-level vision over the past five years.
We propose an asymmetric CNN (ACNet) comprising an asymmetric block (AB), a mem?ory enhancement block (MEB) and a high-frequency feature enhancement block (HFFEB) for image super-resolution.
Our ACNet can effectively address single image super-resolution (SISR), blind SISR and blind SISR of blind noise problems.
arXiv  Detail & Related papers  (2021-03-25T07:10:46Z) - Accurate and Lightweight Image Super-Resolution with Model-Guided Deep
  Unfolding Network [63.69237156340457]
We present and advocate an explainable approach toward SISR named model-guided deep unfolding network (MoG-DUN)
MoG-DUN is accurate (producing fewer aliasing artifacts), computationally efficient (with reduced model parameters), and versatile (capable of handling multiple degradations)
The superiority of the proposed MoG-DUN method to existing state-of-theart image methods including RCAN, SRDNF, and SRFBN is substantiated by extensive experiments on several popular datasets and various degradation scenarios.
arXiv  Detail & Related papers  (2020-09-14T08:23:37Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.