Related papers: SEM-Net: Efficient Pixel Modelling for image inpainting with Spatially Enhanced SSM

SEM-Net: Efficient Pixel Modelling for image inpainting with Spatially Enhanced SSM

URL: http://arxiv.org/abs/2411.06318v1
Date: Sun, 10 Nov 2024 00:35:14 GMT
Title: SEM-Net: Efficient Pixel Modelling for image inpainting with Spatially Enhanced SSM
Authors: Shuang Chen, Haozheng Zhang, Amir Atapour-Abarghouei, Hubert P. H. Shum,
Abstract summary: Image inpainting aims to repair a partially damaged image based on the information from known regions of the images. SEM-Net is a novel visual State Space model (SSM) vision network, modelling corrupted images at the pixel level while capturing long-range dependencies (LRDs) in state space.
Score: 11.447968918063335
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image inpainting aims to repair a partially damaged image based on the information from known regions of the images. \revise{Achieving semantically plausible inpainting results is particularly challenging because it requires the reconstructed regions to exhibit similar patterns to the semanticly consistent regions}. This requires a model with a strong capacity to capture long-range dependencies. Existing models struggle in this regard due to the slow growth of receptive field for Convolutional Neural Networks (CNNs) based methods and patch-level interactions in Transformer-based methods, which are ineffective for capturing long-range dependencies. Motivated by this, we propose SEM-Net, a novel visual State Space model (SSM) vision network, modelling corrupted images at the pixel level while capturing long-range dependencies (LRDs) in state space, achieving a linear computational complexity. To address the inherent lack of spatial awareness in SSM, we introduce the Snake Mamba Block (SMB) and Spatially-Enhanced Feedforward Network. These innovations enable SEM-Net to outperform state-of-the-art inpainting methods on two distinct datasets, showing significant improvements in capturing LRDs and enhancement in spatial consistency. Additionally, SEM-Net achieves state-of-the-art performance on motion deblurring, demonstrating its generalizability. Our source code will be released in https://github.com/ChrisChen1023/SEM-Net.

Related papers

Moiré Zero: An Efficient and High-Performance Neural Architecture for Moiré Removal [8.464291713830127]
We propose MZNet, a U-shaped network designed to bring images closer to a 'Moire-Zero' state by effectively removing moir'e patterns.<n>MZNet achieves state-of-the-art performance on high-resolution datasets and delivers competitive results on lower-resolution dataset.
arXiv Detail & Related papers (2025-07-30T06:16:35Z)
DAMamba: Vision State Space Model with Dynamic Adaptive Scan [51.81060691414399]
State space models (SSMs) have recently garnered significant attention in computer vision. We propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions. Based on DAS, we propose the vision backbone DAMamba, which significantly outperforms current state-of-the-art vision Mamba models in vision tasks.
arXiv Detail & Related papers (2025-02-18T08:12:47Z)
Parallel Sequence Modeling via Generalized Spatial Propagation Network [80.66202109995726]
Generalized Spatial Propagation Network (GSPN) is a new attention mechanism for optimized vision tasks that inherently captures 2D spatial structures.<n>GSPN overcomes limitations by directly operating on spatially coherent image data and forming dense pairwise connections through a line-scan approach.<n>GSPN achieves superior spatial fidelity and state-of-the-art performance in vision tasks, including ImageNet classification, class-guided image generation, and text-to-image generation.
arXiv Detail & Related papers (2025-01-21T18:56:19Z)
Image Forgery Localization with State Space Models [6.6222439382291]
We propose LoMa, a novel image forgery localization method that leverages the selective SSMs.<n>LoMa employs atrous selective scan to traverse the spatial domain and convert the tampered image into ordered patch sequences.<n>This is the first image forgery localization model constructed based on the SSM-based model.
arXiv Detail & Related papers (2024-12-15T15:10:53Z)
XYScanNet: A State Space Model for Single Image Deblurring [6.9752432140704705]
Deep state-space models (SSMs) are emerging as a promising alternative to CNN and Transformer networks. We propose a novel slice-and-scan strategy that alternates scanning along intra-blur and inter-slices. We develop XYScanNet, an SSM architecture integrated with a lightweight feature fusion module for enhanced image deblurring.
arXiv Detail & Related papers (2024-12-13T18:33:18Z)
Empowering Snapshot Compressive Imaging: Spatial-Spectral State Space Model with Across-Scanning and Local Enhancement [51.557804095896174]
We introduce a State Space Model with Across-Scanning and Local Enhancement, named ASLE-SSM, that employs a Spatial-Spectral SSM for global-local balanced context encoding and cross-channel interaction promoting. Experimental results illustrate ASLE-SSM's superiority over existing state-of-the-art methods, with an inference speed 2.4 times faster than Transformer-based MST and saving 0.12 (M) of parameters.
arXiv Detail & Related papers (2024-08-01T15:14:10Z)
Image Deraining with Frequency-Enhanced State Space Model [2.9465623430708905]
This study introduces SSM to image deraining with deraining-specific enhancements and proposes a Deraining Frequency-Enhanced State Space Model (DFSSM) We develop a novel mixed-scale gated-convolutional block, which uses convolutions with multiple kernel sizes to capture various scale degradations effectively. Experiments on synthetic and real-world rainy image datasets show that our method surpasses state-of-the-art methods.
arXiv Detail & Related papers (2024-05-26T07:45:12Z)
Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration. We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z)
IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model [7.842507196763463]
IRSRMamba is a novel framework integrating wavelet transform feature modulation for multi-scale adaptation. IRSRMamba outperforms state-of-the-art methods in PSNR, SSIM, and perceptual quality. This work establishes Mamba-based architectures as a promising direction for high-fidelity IR image enhancement.
arXiv Detail & Related papers (2024-05-16T07:49:24Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
SDM: Spatial Diffusion Model for Large Hole Image Inpainting [106.90795513361498]
We present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image. Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion.
arXiv Detail & Related papers (2022-12-06T13:30:18Z)
Accurate and Lightweight Image Super-Resolution with Model-Guided Deep Unfolding Network [63.69237156340457]
We present and advocate an explainable approach toward SISR named model-guided deep unfolding network (MoG-DUN) MoG-DUN is accurate (producing fewer aliasing artifacts), computationally efficient (with reduced model parameters), and versatile (capable of handling multiple degradations) The superiority of the proposed MoG-DUN method to existing state-of-theart image methods including RCAN, SRDNF, and SRFBN is substantiated by extensive experiments on several popular datasets and various degradation scenarios.
arXiv Detail & Related papers (2020-09-14T08:23:37Z)
Enhanced Residual Networks for Context-based Image Outpainting [0.0]
Deep models struggle to understand context and extrapolation through retained information. Current models use generative adversarial networks to generate results which lack localized image feature consistency and appear fake. We propose two methods to improve this issue: the use of a local and global discriminator, and the addition of residual blocks within the encoding section of the network.
arXiv Detail & Related papers (2020-05-14T05:14:26Z)
Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields. To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss. We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.