MSP-Former: Multi-Scale Projection Transformer for Single Image
Desnowing
- URL: http://arxiv.org/abs/2207.05621v1
- Date: Tue, 12 Jul 2022 15:44:07 GMT
- Title: MSP-Former: Multi-Scale Projection Transformer for Single Image
Desnowing
- Authors: Sixiang Chen, Tian Ye, Yun Liu, Taodong Liao, Yi Ye, Erkang Chen
- Abstract summary: We apply the vision transformer to the task of snow removal from a single image.
We propose a parallel network architecture split along the channel, performing local feature refinement and global information modeling separately.
In the experimental part, we conduct extensive experiments to demonstrate the superiority of our method.
- Score: 6.22867695581195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image restoration of snow scenes in severe weather is a difficult task. Snow
images have complex degradations and are cluttered over clean images, changing
the distribution of clean images. The previous methods based on CNNs are
challenging to remove perfectly in restoring snow scenes due to their local
inductive biases' lack of a specific global modeling ability. In this paper, we
apply the vision transformer to the task of snow removal from a single image.
Specifically, we propose a parallel network architecture split along the
channel, performing local feature refinement and global information modeling
separately. We utilize a channel shuffle operation to combine their respective
strengths to enhance network performance. Second, we propose the MSP module,
which utilizes multi-scale avgpool to aggregate information of different sizes
and simultaneously performs multi-scale projection self-attention on multi-head
self-attention to improve the representation ability of the model under
different scale degradations. Finally, we design a lightweight and simple local
capture module, which can refine the local capture capability of the model.
In the experimental part, we conduct extensive experiments to demonstrate the
superiority of our method. We compared the previous snow removal methods on
three snow scene datasets. The experimental results show that our method
surpasses the state-of-the-art methods with fewer parameters and computation.
We achieve substantial growth by 1.99dB and SSIM 0.03 on the CSD test dataset.
On the SRRS and Snow100K datasets, we also increased PSNR by 2.47dB and 1.62dB
compared with the Transweather approach and improved by 0.03 in SSIM. In the
visual comparison section, our MSP-Former also achieves better visual effects
than existing methods, proving the usability of our method.
Related papers
- NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion [56.98287481620215]
We present a novel method for 3D surface reconstruction from multiple images where only a part of the object of interest is captured.
Our approach builds on two recent developments: surface reconstruction using neural radiance fields for the reconstruction of the visible parts of the surface, and guidance of pre-trained 2D diffusion models in the form of Score Distillation Sampling (SDS) to complete the shape in unobserved regions in a plausible manner.
arXiv Detail & Related papers (2023-12-07T19:30:55Z) - Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image [85.91935485902708]
We show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models.
We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2023-07-20T16:14:23Z) - ExposureDiffusion: Learning to Expose for Low-light Image Enhancement [87.08496758469835]
This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure model.
Our method obtains significantly improved performance and reduced inference time compared with vanilla diffusion models.
The proposed framework can work with both real-paired datasets, SOTA noise models, and different backbone networks.
arXiv Detail & Related papers (2023-07-15T04:48:35Z) - Star-Net: Improving Single Image Desnowing Model With More Efficient
Connection and Diverse Feature Interaction [0.8602553195689513]
We propose a novel single image desnowing network called Star-Net.
First, we design a Star type Skip Connection (SSC) to establish information channels for all different scale features.
Second, we present a Multi-Stage Interactive Transformer (MIT) as the base module of Star-Net.
Third, we propose a Degenerate Filter Module (DFM) to filter the snow particle and snow fog residual in the SSC.
arXiv Detail & Related papers (2023-03-17T14:03:49Z) - Event-guided Multi-patch Network with Self-supervision for Non-uniform
Motion Deblurring [113.96237446327795]
We present a novel self-supervised event-guided deep hierarchical Multi-patch Network to deal with blurry images and videos.
We also propose an event-guided architecture to exploit motion cues contained in videos to tackle complex blur in videos.
Our MPN achieves the state of the art on the GoPro and VideoDeblurring datasets with a 40x faster runtime compared to current multi-scale methods.
arXiv Detail & Related papers (2023-02-14T15:58:00Z) - Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot
Classification via Stable Diffusion [22.237426507711362]
Model-Agnostic Zero-Shot Classification (MA-ZSC) refers to training non-specific classification architectures to classify real images without using any real images during training.
Recent research has demonstrated that generating synthetic training images using diffusion models provides a potential solution to address MA-ZSC.
We propose modifications to the text-to-image generation process using a pre-trained diffusion model to enhance diversity.
arXiv Detail & Related papers (2023-02-07T07:13:53Z) - Cross-domain Self-supervised Framework for Photoacoustic Computed
Tomography Image Reconstruction [4.769412124596113]
We propose a cross-domain unsupervised reconstruction (CDUR) strategy with a pure transformer model.
We implement a self-supervised reconstruction in a model-based form and leverage the self-supervision to enforce the measurement and image consistency.
Experimental results on in-vivo PACT dataset of mice demonstrate the potential of our unsupervised framework.
arXiv Detail & Related papers (2023-01-17T03:47:01Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - SnowFormer: Scale-aware Transformer via Context Interaction for Single
Image Desnowing [9.747362856056162]
We propose a powerful architecture dubbed as SnowFormer for single image desnowing.
It performs Scale-aware Feature Aggregation in the encoder to capture rich snow information of various degradations.
It also uses a novel Context Interaction Transformer Block in the decoder, which conducts context interaction of local details and global information.
arXiv Detail & Related papers (2022-08-20T15:01:09Z) - Towards Real-time High-Definition Image Snow Removal: Efficient Pyramid
Network with Asymmetrical Encoder-decoder Architecture [6.682410871522934]
We develop a novel Efficient Pyramid Network with asymmetrical encoder-decoder architecture for real-time HD image desnowing.
Our approach achieves a better complexity-performance trade-off and effectively handles the processing difficulties of HD and Ultra-HD images.
arXiv Detail & Related papers (2022-07-12T15:18:41Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.