Parallel Cross Strip Attention Network for Single Image Dehazing
- URL: http://arxiv.org/abs/2405.05811v1
- Date: Thu, 9 May 2024 14:50:07 GMT
- Title: Parallel Cross Strip Attention Network for Single Image Dehazing
- Authors: Lihan Tong, Yun Liu, Tian Ye, Weijia Li, Liyuan Chen, Erkang Chen,
- Abstract summary: Single image dehazing aims to restore hazy images and produce clear, high-quality visuals.
Traditional convolutional models struggle with long-range dependencies due to limited receptive field size.
We introduce a novel dehazing network based on Parallel Stripe Cross Attention (PCSA) with a multi-scale strategy.
- Score: 15.246376325081973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of single image dehazing is to restore hazy images and produce clear, high-quality visuals. Traditional convolutional models struggle with long-range dependencies due to their limited receptive field size. While Transformers excel at capturing such dependencies, their quadratic computational complexity in relation to feature map resolution makes them less suitable for pixel-to-pixel dense prediction tasks. Moreover, fixed kernels or tokens in most models do not adapt well to varying blur sizes, resulting in suboptimal dehazing performance. In this study, we introduce a novel dehazing network based on Parallel Stripe Cross Attention (PCSA) with a multi-scale strategy. PCSA efficiently integrates long-range dependencies by simultaneously capturing horizontal and vertical relationships, allowing each pixel to capture contextual cues from an expanded spatial domain. To handle different sizes and shapes of blurs flexibly, We employs a channel-wise design with varying convolutional kernel sizes and strip lengths in each PCSA to capture context information at different scales.Additionally, we incorporate a softmax-based adaptive weighting mechanism within PCSA to prioritize and leverage more critical features.
Related papers
- Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP)
Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid.
PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z) - Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models [22.702352459581434]
Serpent is an efficient architecture for high-resolution image restoration.
We show that Serpent can achieve reconstruction quality on par with state-of-the-art techniques.
arXiv Detail & Related papers (2024-03-26T17:43:15Z) - Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring [25.36888929483233]
We propose a multi-scale network based on single-input and multiple-outputs(SIMO) for motion deblurring.
We combine the characteristics of real-world trajectories with a learnable wavelet transform module to focus on the directional continuity and frequency features of the step-by-step transitions between blurred images to sharp images.
arXiv Detail & Related papers (2023-12-29T02:59:40Z) - Differentiable Registration of Images and LiDAR Point Clouds with
VoxelPoint-to-Pixel Matching [58.10418136917358]
Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training.
Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks.
We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
arXiv Detail & Related papers (2023-12-07T05:46:10Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - From Coarse to Fine: Hierarchical Pixel Integration for Lightweight
Image Super-Resolution [41.0555613285837]
Transformer-based models have achieved competitive performances in image super-resolution (SR)
We propose a new attention block whose insights are from the interpretation of Local Map (LAM) for SR networks.
In the fine area, we use an Intra-Patch Self-Attention Attribution (IPSA) module to model long-range pixel dependencies in a local patch.
arXiv Detail & Related papers (2022-11-30T06:32:34Z) - Lightweight Long-Range Generative Adversarial Networks [58.16484259508973]
We introduce a novel lightweight generative adversarial networks, which can effectively capture long-range dependencies in the image generation process.
The proposed long-range module can highlight negative relations between pixels, working as a regularization to stabilize training.
Our novel long-range module only introduces few additional parameters and is easily inserted into existing models to capture long-range dependencies.
arXiv Detail & Related papers (2022-09-08T13:05:01Z) - Adaptive Single Image Deblurring [43.02281823557039]
We propose an efficient pixel adaptive and feature attentive design for handling large blur variations within and across different images.
We also propose an effective content-aware global-local filtering module that significantly improves the performance.
arXiv Detail & Related papers (2022-01-01T10:10:19Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
Homography Estimation [52.63874513999119]
Cross-resolution image alignment is a key problem in multiscale giga photography.
Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges.
We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
arXiv Detail & Related papers (2021-06-08T02:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.