ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer
- URL: http://arxiv.org/abs/2408.09940v1
- Date: Mon, 19 Aug 2024 12:23:15 GMT
- Title: ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer
- Authors: Alik Pramanick, Utsav Bheda, Arijit Sur,
- Abstract summary: This work proposes a transformer-based super-resolution architecture called ML-CrAIST.
We operate spatial and channel self-attention, which concurrently model pixel interaction from both spatial and channel dimensions.
We devise a cross-attention block for super-resolution, which explores the correlations between low and high-frequency information.
- Score: 3.686808512438363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, transformers have captured significant interest in the area of single-image super-resolution tasks, demonstrating substantial gains in performance. Current models heavily depend on the network's extensive ability to extract high-level semantic details from images while overlooking the effective utilization of multi-scale image details and intermediate information within the network. Furthermore, it has been observed that high-frequency areas in images present significant complexity for super-resolution compared to low-frequency areas. This work proposes a transformer-based super-resolution architecture called ML-CrAIST that addresses this gap by utilizing low-high frequency information in multiple scales. Unlike most of the previous work (either spatial or channel), we operate spatial and channel self-attention, which concurrently model pixel interaction from both spatial and channel dimensions, exploiting the inherent correlations across spatial and channel axis. Further, we devise a cross-attention block for super-resolution, which explores the correlations between low and high-frequency information. Quantitative and qualitative assessments indicate that our proposed ML-CrAIST surpasses state-of-the-art super-resolution methods (e.g., 0.15 dB gain @Manga109 $\times$4). Code is available on: https://github.com/Alik033/ML-CrAIST.
Related papers
- Multi-scale Unified Network for Image Classification [33.560003528712414]
CNNs face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs.
We propose Multi-scale Unified Network (MUSN) consisting of multi-scales, a unified network, and scale-invariant constraint.
MUSN yields an accuracy increase up to 44.53% and diminishes FLOPs by 7.01-16.13% in multi-scale scenarios.
arXiv Detail & Related papers (2024-03-27T06:40:26Z) - Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring [25.36888929483233]
We propose a multi-scale network based on single-input and multiple-outputs(SIMO) for motion deblurring.
We combine the characteristics of real-world trajectories with a learnable wavelet transform module to focus on the directional continuity and frequency features of the step-by-step transitions between blurred images to sharp images.
arXiv Detail & Related papers (2023-12-29T02:59:40Z) - Cross-View Hierarchy Network for Stereo Image Super-Resolution [14.574538513341277]
Stereo image super-resolution aims to improve the quality of high-resolution stereo image pairs by exploiting complementary information across views.
We propose a novel method, named Cross-View-Hierarchy Network for Stereo Image Super-Resolution (CVHSSR)
CVHSSR achieves the best stereo image super-resolution performance than other state-of-the-art methods while using fewer parameters.
arXiv Detail & Related papers (2023-04-13T03:11:30Z) - SufrinNet: Toward Sufficient Cross-View Interaction for Stereo Image
Enhancement in The Dark [119.01585302856103]
Low-light stereo image enhancement (LLSIE) is a relatively new task to enhance the quality of visually unpleasant stereo images captured in dark conditions.
Current methods clearly suffer from two shortages: 1) insufficient cross-view interaction; 2) lacking long-range dependency for intra-view learning.
We propose a novel LLSIE model, termed underlineSufficient Cunderlineross-View underlineInteraction Network (SufrinNet)
arXiv Detail & Related papers (2022-11-02T04:01:30Z) - Decoupled-and-Coupled Networks: Self-Supervised Hyperspectral Image
Super-Resolution with Subpixel Fusion [67.35540259040806]
We propose a subpixel-level HS super-resolution framework by devising a novel decoupled-and-coupled network, called DCNet.
As the name suggests, DC-Net first decouples the input into common (or cross-sensor) and sensor-specific components.
We append a self-supervised learning module behind the CSU net by guaranteeing the material consistency to enhance the detailed appearances of the restored HS product.
arXiv Detail & Related papers (2022-05-07T23:40:36Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow [23.457898451057275]
Optical flow estimation aims to find the 2D motion field by identifying corresponding pixels between two images.
Despite the tremendous progress of deep learning-based optical flow methods, it remains a challenge to accurately estimate large displacements with motion blur.
This is mainly because the correlation volume, the basis of pixel matching, is computed as the dot product of the convolutional features of the two images.
We propose a new architecture "CRoss-Attentional Flow Transformer" (CRAFT) to revitalize the correlation volume computation.
arXiv Detail & Related papers (2022-03-31T09:05:00Z) - TFill: Image Completion via a Transformer-Based Architecture [69.62228639870114]
We propose treating image completion as a directionless sequence-to-sequence prediction task.
We employ a restrictive CNN with small and non-overlapping RF for token representation.
In a second phase, to improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced.
arXiv Detail & Related papers (2021-04-02T01:42:01Z) - Sequential Hierarchical Learning with Distribution Transformation for
Image Super-Resolution [83.70890515772456]
We build a sequential hierarchical learning super-resolution network (SHSR) for effective image SR.
We consider the inter-scale correlations of features, and devise a sequential multi-scale block (SMB) to progressively explore the hierarchical information.
Experiment results show SHSR achieves superior quantitative performance and visual quality to state-of-the-art methods.
arXiv Detail & Related papers (2020-07-19T01:35:53Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.