U-shaped Vision Mamba for Single Image Dehazing
- URL: http://arxiv.org/abs/2402.04139v4
- Date: Fri, 16 Feb 2024 02:15:32 GMT
- Title: U-shaped Vision Mamba for Single Image Dehazing
- Authors: Zhuoran Zheng and Chen Wu
- Abstract summary: We introduce Vision Mamba (UVM-Net), an efficient single-image dehazing network.
Inspired by the State Space Sequence Models (SSMs), a new deep sequence model known for its power to handle long sequences, we design a Bi-SSM block.
Our method takes only text0.009 seconds to infer a $325 times 325$ resolution image (100FPS) without I/O handling time.
- Score: 8.134659382415185
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Currently, Transformer is the most popular architecture for image dehazing,
but due to its large computational complexity, its ability to handle long-range
dependency is limited on resource-constrained devices. To tackle this
challenge, we introduce the U-shaped Vision Mamba (UVM-Net), an efficient
single-image dehazing network. Inspired by the State Space Sequence Models
(SSMs), a new deep sequence model known for its power to handle long sequences,
we design a Bi-SSM block that integrates the local feature extraction ability
of the convolutional layer with the ability of the SSM to capture long-range
dependencies. Extensive experimental results demonstrate the effectiveness of
our method. Our method provides a more highly efficient idea of long-range
dependency modeling for image dehazing as well as other image restoration
tasks. The URL of the code is \url{https://github.com/zzr-idam/UVM-Net}. Our
method takes only \textbf{0.009} seconds to infer a $325 \times 325$ resolution
image (100FPS) without I/O handling time.
Related papers
- SEM-Net: Efficient Pixel Modelling for image inpainting with Spatially Enhanced SSM [11.447968918063335]
Image inpainting aims to repair a partially damaged image based on the information from known regions of the images.
SEM-Net is a novel visual State Space model (SSM) vision network, modelling corrupted images at the pixel level while capturing long-range dependencies (LRDs) in state space.
arXiv Detail & Related papers (2024-11-10T00:35:14Z) - Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats [31.37432523412404]
Long-LRM can process 32 source images at 960x540 resolution within only 1.3 seconds on a single A100 80G GPU.
Unlike previous feed-forward models that are limited to processing 14 input images, Long-LRM reconstructs the entire scene in a single feed-forward step.
arXiv Detail & Related papers (2024-10-16T17:54:06Z) - FC3DNet: A Fully Connected Encoder-Decoder for Efficient Demoir'eing [50.702284015455405]
We propose a textbfFully textbfConnected entextbfCoder-detextbfCoder based textbfDemoir'eing textbfNetwork (FC3DNet)
FC3DNet utilizes features with multiple scales in each stage of the decoder for comprehensive information.
arXiv Detail & Related papers (2024-06-21T07:10:50Z) - Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models [26.926712014346432]
This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization.
Our method's efficacy is demonstrated on the class-conditional ImageNet generation benchmark, setting new state-of-the-art FID scores of 1.70 on ImageNet 256 x 256 and 2.89 on ImageNet 512 x 512.
arXiv Detail & Related papers (2024-06-13T17:59:58Z) - DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis [56.849285913695184]
Diffusion Mamba (DiM) is a sequence model for efficient high-resolution image synthesis.
DiM architecture achieves inference-time efficiency for high-resolution images.
Experiments demonstrate the effectiveness and efficiency of our DiM.
arXiv Detail & Related papers (2024-05-23T06:53:18Z) - SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation [16.476244833079182]
We introduce SegMamba, a novel 3D medical image textbfSegmentation textbfMamba model.
SegMamba excels in whole volume feature modeling from a state space model standpoint.
Experiments on the BraTS2023 dataset demonstrate the effectiveness and efficiency of our SegMamba.
arXiv Detail & Related papers (2024-01-24T16:17:23Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Lightweight Long-Range Generative Adversarial Networks [58.16484259508973]
We introduce a novel lightweight generative adversarial networks, which can effectively capture long-range dependencies in the image generation process.
The proposed long-range module can highlight negative relations between pixels, working as a regularization to stabilize training.
Our novel long-range module only introduces few additional parameters and is easily inserted into existing models to capture long-range dependencies.
arXiv Detail & Related papers (2022-09-08T13:05:01Z) - Long-Short Transformer: Efficient Transformers for Language and Vision [97.2850205384295]
Long-Short Transformer (Transformer-LS) is an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks.
It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations.
Our method outperforms the state-of-the-art models on multiple tasks in language and vision domains, including the Long Range Arena benchmark, autoregressive language modeling, and ImageNet classification.
arXiv Detail & Related papers (2021-07-05T18:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.