Omni Aggregation Networks for Lightweight Image Super-Resolution
- URL: http://arxiv.org/abs/2304.10244v2
- Date: Mon, 24 Apr 2023 09:03:56 GMT
- Title: Omni Aggregation Networks for Lightweight Image Super-Resolution
- Authors: Hang Wang, Xuanhong Chen, Bingbing Ni, Yutian Liu, Jinfan Liu
- Abstract summary: This work proposes two enhanced components under a new Omni-SR architecture.
First, an Omni Self-Attention (OSA) block is proposed based on dense interaction principle.
Second, a multi-scale interaction scheme is proposed to mitigate sub-optimal ERF.
- Score: 42.252518645833696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While lightweight ViT framework has made tremendous progress in image
super-resolution, its uni-dimensional self-attention modeling, as well as
homogeneous aggregation scheme, limit its effective receptive field (ERF) to
include more comprehensive interactions from both spatial and channel
dimensions. To tackle these drawbacks, this work proposes two enhanced
components under a new Omni-SR architecture. First, an Omni Self-Attention
(OSA) block is proposed based on dense interaction principle, which can
simultaneously model pixel-interaction from both spatial and channel
dimensions, mining the potential correlations across omni-axis (i.e., spatial
and channel). Coupling with mainstream window partitioning strategies, OSA can
achieve superior performance with compelling computational budgets. Second, a
multi-scale interaction scheme is proposed to mitigate sub-optimal ERF (i.e.,
premature saturation) in shallow models, which facilitates local propagation
and meso-/global-scale interactions, rendering an omni-scale aggregation
building block. Extensive experiments demonstrate that Omni-SR achieves
record-high performance on lightweight super-resolution benchmarks (e.g., 26.95
dB@Urban100 $\times 4$ with only 792K parameters). Our code is available at
\url{https://github.com/Francis0625/Omni-SR}.
Related papers
- Cross Paradigm Representation and Alignment Transformer for Image Deraining [40.66823807648992]
We propose a novel Cross Paradigm Representation and Alignment Transformer (CPRAformer)
Its core idea is the hierarchical representation and alignment, leveraging the strengths of both paradigms to aid image reconstruction.
We use two types of self-attention in the Transformer blocks: sparse prompt channel self-attention (SPC-SA) and spatial pixel refinement self-attention (SPR-SA)
arXiv Detail & Related papers (2025-04-23T06:44:46Z) - ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification [5.863175733097434]
We propose a novel architecture, named the Asymmetric Semantic Aligning Network (ASANet) to address the issue of asymmetry at the feature level.
The proposed ASANet effectively learns feature correlations between the two modalities and eliminates noise caused by feature differences.
We have established a new RGB-SAR multimodal dataset, on which our ASANet outperforms other mainstream methods with improvements ranging from 1.21% to 17.69%.
arXiv Detail & Related papers (2024-12-03T00:03:33Z) - $\text{S}^{3}$Mamba: Arbitrary-Scale Super-Resolution via Scaleable State Space Model [45.65903826290642]
ASSR aims to super-resolve low-resolution images to high-resolution images at any scale using a single model.
We propose a novel arbitrary-scale super-resolution method, called $textS3$Mamba, to construct a scalable continuous representation space.
arXiv Detail & Related papers (2024-11-16T11:13:02Z) - Large coordinate kernel attention network for lightweight image super-resolution [5.66935513638074]
We propose the multi-scale blueprint separable convolutions (MBSConv) as highly efficient building block with multi-scale receptive field.
We also propose a large coordinate kernel attention (LCKA) module which decomposes the 2D convolutional kernels of the depth-wise convolutional layers in LKA into horizontal and vertical 1-D kernels.
arXiv Detail & Related papers (2024-05-15T14:03:38Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously
Exploiting Image and Event Modalities [6.543272301133159]
Event cameras detect changes in per-pixel intensity to generate asynchronous event streams.
They offer great potential for accurate semantic map retrieval in real-time autonomous systems.
Existing implementations for event segmentation suffer from sub-based performance.
We propose hybrid end-to-end learning framework HALSIE to reduce inference cost by up to $20times$ versus art.
arXiv Detail & Related papers (2022-11-19T17:09:50Z) - ShuffleMixer: An Efficient ConvNet for Image Super-Resolution [88.86376017828773]
We propose ShuffleMixer, for lightweight image super-resolution that explores large convolution and channel split-shuffle operation.
Specifically, we develop a large depth-wise convolution and two projection layers based on channel splitting and shuffling as the basic component to mix features efficiently.
Experimental results demonstrate that the proposed ShuffleMixer is about 6x smaller than the state-of-the-art methods in terms of model parameters and FLOPs.
arXiv Detail & Related papers (2022-05-30T15:26:52Z) - MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared
Person Re-Identification [35.97494894205023]
RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality.
Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space.
We present a novel multi-feature space joint optimization (MSO) network, which can learn modality-sharable features in both the single-modality space and the common space.
arXiv Detail & Related papers (2021-10-21T16:45:23Z) - Lightweight Single-Image Super-Resolution Network with Attentive
Auxiliary Feature Learning [73.75457731689858]
We develop a computation efficient yet accurate network based on the proposed attentive auxiliary features (A$2$F) for SISR.
Experimental results on large-scale dataset demonstrate the effectiveness of the proposed model against the state-of-the-art (SOTA) SR methods.
arXiv Detail & Related papers (2020-11-13T06:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.