Activating Wider Areas in Image Super-Resolution
- URL: http://arxiv.org/abs/2403.08330v1
- Date: Wed, 13 Mar 2024 08:29:58 GMT
- Title: Activating Wider Areas in Image Super-Resolution
- Authors: Cheng Cheng, Hang Wang, Hongbin Sun
- Abstract summary: Vision Mamba (Vim) is capable of finding the most relevant and representative input pixels to reconstruct the corresponding high-resolution images.
MMA achieves competitive or even superior performance compared to state-of-the-art SISR methods.
- Score: 23.52183937294807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The prevalence of convolution neural networks (CNNs) and vision transformers
(ViTs) has markedly revolutionized the area of single-image super-resolution
(SISR). To further boost the SR performances, several techniques, such as
residual learning and attention mechanism, are introduced, which can be largely
attributed to a wider range of activated area, that is, the input pixels that
strongly influence the SR results. However, the possibility of further
improving SR performance through another versatile vision backbone remains an
unresolved challenge. To address this issue, in this paper, we unleash the
representation potential of the modern state space model, i.e., Vision Mamba
(Vim), in the context of SISR. Specifically, we present three recipes for
better utilization of Vim-based models: 1) Integration into a MetaFormer-style
block; 2) Pre-training on a larger and broader dataset; 3) Employing
complementary attention mechanism, upon which we introduce the MMA. The
resulting network MMA is capable of finding the most relevant and
representative input pixels to reconstruct the corresponding high-resolution
images. Comprehensive experimental analysis reveals that MMA not only achieves
competitive or even superior performance compared to state-of-the-art SISR
methods but also maintains relatively low memory and computational overheads
(e.g., +0.5 dB PSNR elevation on Manga109 dataset with 19.8 M parameters at the
scale of 2). Furthermore, MMA proves its versatility in lightweight SR
applications. Through this work, we aim to illuminate the potential
applications of state space models in the broader realm of image processing
rather than SISR, encouraging further exploration in this innovative direction.
Related papers
- Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - DVMSR: Distillated Vision Mamba for Efficient Super-Resolution [7.551130027327461]
We propose DVMSR, a novel lightweight Image SR network that incorporates Vision Mamba and a distillation strategy.
Our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters.
arXiv Detail & Related papers (2024-05-05T17:34:38Z) - Better "CMOS" Produces Clearer Images: Learning Space-Variant Blur
Estimation for Blind Image Super-Resolution [30.816546273417774]
We introduce two new datasets with out-of-focus blur, i.e., NYUv2-BSR and Cityscapes-BSR, to support further researches of blind SR with space-variant blur.
Based on the datasets, we design a novel Cross-MOdal fuSion network (CMOS) that estimate both blur and semantics simultaneously.
arXiv Detail & Related papers (2023-04-07T08:40:31Z) - CiaoSR: Continuous Implicit Attention-in-Attention Network for
Arbitrary-Scale Image Super-Resolution [158.2282163651066]
This paper proposes a continuous implicit attention-in-attention network, called CiaoSR.
We explicitly design an implicit attention network to learn the ensemble weights for the nearby local features.
We embed a scale-aware attention in this implicit attention network to exploit additional non-local information.
arXiv Detail & Related papers (2022-12-08T15:57:46Z) - RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive
Feature Alignment and Selection [66.08293086254851]
We propose a reciprocal learning framework to reinforce the learning of a RefSR network.
The newly proposed module aligns reference-input images at multi-scale feature spaces and performs reference-aware feature selection.
We empirically show that multiple recent state-of-the-art RefSR models can be consistently improved with our reciprocal learning paradigm.
arXiv Detail & Related papers (2022-11-08T12:39:35Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - Accurate and Lightweight Image Super-Resolution with Model-Guided Deep
Unfolding Network [63.69237156340457]
We present and advocate an explainable approach toward SISR named model-guided deep unfolding network (MoG-DUN)
MoG-DUN is accurate (producing fewer aliasing artifacts), computationally efficient (with reduced model parameters), and versatile (capable of handling multiple degradations)
The superiority of the proposed MoG-DUN method to existing state-of-theart image methods including RCAN, SRDNF, and SRFBN is substantiated by extensive experiments on several popular datasets and various degradation scenarios.
arXiv Detail & Related papers (2020-09-14T08:23:37Z) - DDet: Dual-path Dynamic Enhancement Network for Real-World Image
Super-Resolution [69.2432352477966]
Real image super-resolution(Real-SR) focus on the relationship between real-world high-resolution(HR) and low-resolution(LR) image.
In this article, we propose a Dual-path Dynamic Enhancement Network(DDet) for Real-SR.
Unlike conventional methods which stack up massive convolutional blocks for feature representation, we introduce a content-aware framework to study non-inherently aligned image pair.
arXiv Detail & Related papers (2020-02-25T18:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.