Related papers: Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring

Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring

URL: http://arxiv.org/abs/2403.20106v2
Date: Fri, 5 Apr 2024 10:29:00 GMT
Title: Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring
Authors: Hu Gao, Depeng Dang,
Abstract summary: Image deblurring aims to restore a high-quality image from its corresponding blurred. We propose an efficient image deblurring network that leverages selective state spaces model to aggregate enriched and accurate features. Experimental results demonstrate that the proposed method outperforms state-of-the-art approaches on widely used benchmarks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image deblurring aims to restore a high-quality image from its corresponding blurred. The emergence of CNNs and Transformers has enabled significant progress. However, these methods often face the dilemma between eliminating long-range degradation perturbations and maintaining computational efficiency. While the selective state space model (SSM) shows promise in modeling long-range dependencies with linear complexity, it also encounters challenges such as local pixel forgetting and channel redundancy. To address this issue, we propose an efficient image deblurring network that leverages selective state spaces model to aggregate enriched and accurate features. Specifically, we introduce an aggregate local and global information block (ALGBlock) designed to effectively capture and integrate both local invariant properties and non-local information. The ALGBlock comprises two primary modules: a module for capturing local and global features (CLGF), and a feature aggregation module (FA). The CLGF module is composed of two branches: the global branch captures long-range dependency features via a selective state spaces model, while the local branch employs simplified channel attention to model local connectivity, thereby reducing local pixel forgetting and channel redundancy. In addition, we design a FA module to accentuate the local part by recalibrating the weight during the aggregation of the two branches for restoration. Experimental results demonstrate that the proposed method outperforms state-of-the-art approaches on widely used benchmarks.

Related papers

Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation [158.37640586809187]
Restoring any degraded image efficiently via just one model has become increasingly significant. Our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations. To fuse the degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed.
arXiv Detail & Related papers (2025-04-19T09:54:46Z)
SEM-Net: Efficient Pixel Modelling for image inpainting with Spatially Enhanced SSM [11.447968918063335]
Image inpainting aims to repair a partially damaged image based on the information from known regions of the images. SEM-Net is a novel visual State Space model (SSM) vision network, modelling corrupted images at the pixel level while capturing long-range dependencies (LRDs) in state space.
arXiv Detail & Related papers (2024-11-10T00:35:14Z)
MambaMIC: An Efficient Baseline for Microscopic Image Classification with State Space Models [12.182070604073585]
We propose a vision backbone for Microscopic Image Classification (MIC) tasks, named MambaMIC. Specifically, we introduce a Local-Global dual-branch aggregation module: the MambaMIC Block. In the local branch, we use local convolutions to capture pixel similarity, mitigating local pixel forgetting and enhancing perception. In the global branch, SSM extracts global dependencies, while Locally Aware Enhanced Filter reduces channel redundancy and local pixel forgetting.
arXiv Detail & Related papers (2024-09-12T10:01:33Z)
LoFormer: Local Frequency Transformer for Image Deblurring [12.032239441930306]
We introduce a novel approach termed Local Frequency Transformer (LoFormer) Within each unit of LoFormer, we incorporate a Local Channel-wise SA in the frequency domain (Freq-LC) to simultaneously capture cross-covariance within low- and high-frequency local windows. Our experiments demonstrate that LoFormer significantly improves performance in the image deblurring task, achieving a PSNR of 34.09 dB on the GoPro dataset with 126G FLOPs.
arXiv Detail & Related papers (2024-07-24T04:27:03Z)
Emphasizing Crucial Features for Efficient Image Restoration [6.204240924744974]
We propose a framework to adapt to varying degrees of degradation across different regions for image restoration. Specifically, we design a spatial and frequency attention mechanism (SFAM) to emphasize crucial features for restoration. We also propose our ECFNet, which integrates the aforementioned components into a U-shaped backbone for recovering high-quality images.
arXiv Detail & Related papers (2024-05-19T07:04:05Z)
Spatial-Aware Token for Weakly Supervised Object Localization [137.0570026552845]
We propose a task-specific spatial-aware token to condition localization in a weakly supervised manner. Experiments show that the proposed SAT achieves state-of-the-art performance on both CUB-200 and ImageNet, with 98.45% and 73.13% GT-known Loc.
arXiv Detail & Related papers (2023-03-18T15:38:17Z)
Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block. Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z)
DuAT: Dual-Aggregation Transformer Network for Medical Image Segmentation [21.717520350930705]
Transformer-based models have been widely demonstrated to be successful in computer vision tasks. However, they are often dominated by features of large patterns leading to the loss of local details. We propose a Dual-Aggregation Transformer Network called DuAT, which is characterized by two innovative designs. Our proposed model outperforms state-of-the-art methods in the segmentation of skin lesion images, and polyps in colonoscopy images.
arXiv Detail & Related papers (2022-12-21T07:54:02Z)
Cross-modal Local Shortest Path and Global Enhancement for Visible-Thermal Person Re-Identification [2.294635424666456]
We propose the Cross-modal Local Shortest Path and Global Enhancement (CM-LSP-GE) modules,a two-stream network based on joint learning of local and global features. The experimental results on two typical datasets show that our model is obviously superior to the most state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T10:27:22Z)
CM-GAN: Image Inpainting with Cascaded Modulation GAN and Object-Aware Training [112.96224800952724]
We propose cascaded modulation GAN (CM-GAN) to generate plausible image structures when dealing with large holes in complex images. In each decoder block, global modulation is first applied to perform coarse semantic-aware synthesis structure, then spatial modulation is applied on the output of global modulation to further adjust the feature map in a spatially adaptive fashion. In addition, we design an object-aware training scheme to prevent the network from hallucinating new objects inside holes, fulfilling the needs of object removal tasks in real-world scenarios.
arXiv Detail & Related papers (2022-03-22T16:13:27Z)
Layout-to-Image Translation with Double Pooling Generative Adversarial Networks [76.83075646527521]
We propose a novel Double Pooing GAN (DPGAN) for generating photo-realistic and semantically-consistent results from the input layout. We also propose a novel Double Pooling Module (DPM), which consists of the Square-shape Pooling Module (SPM) and the Rectangle-shape Pooling Module ( RPM)
arXiv Detail & Related papers (2021-08-29T19:55:14Z)
Global Aggregation then Local Distribution for Scene Parsing [99.1095068574454]
We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks. Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
arXiv Detail & Related papers (2021-07-28T03:46:57Z)
Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining [66.82470461139376]
We propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network. By combining the new CS-NL prior with local and in-scale non-local priors in a powerful recurrent fusion cell, we can find more cross-scale feature correlations within a single low-resolution image.
arXiv Detail & Related papers (2020-06-02T07:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.