Accurate and lightweight dehazing via multi-receptive-field non-local
network and novel contrastive regularization
- URL: http://arxiv.org/abs/2309.16494v1
- Date: Thu, 28 Sep 2023 14:59:16 GMT
- Title: Accurate and lightweight dehazing via multi-receptive-field non-local
network and novel contrastive regularization
- Authors: Zewei He, Zixuan Chen, Ziqian Lu, Xuecheng Sun, Zhe-Ming Lu
- Abstract summary: This paper presents a multi-receptive-field non-local network (MRFNLN) for image dehazing.
It is designed as a multi-stream feature attention block (MSFAB) and cross non-local block (CNLB)
It outperforms recent state-of-the-art dehazing methods with less than 1.5 Million parameters.
- Score: 9.90146712189936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, deep learning-based methods have dominated image dehazing domain.
Although very competitive dehazing performance has been achieved with
sophisticated models, effective solutions for extracting useful features are
still under-explored. In addition, non-local network, which has made a
breakthrough in many vision tasks, has not been appropriately applied to image
dehazing. Thus, a multi-receptive-field non-local network (MRFNLN) consisting
of the multi-stream feature attention block (MSFAB) and cross non-local block
(CNLB) is presented in this paper. We start with extracting richer features for
dehazing. Specifically, we design a multi-stream feature extraction (MSFE)
sub-block, which contains three parallel convolutions with different receptive
fields (i.e., $1\times 1$, $3\times 3$, $5\times 5$) for extracting multi-scale
features. Following MSFE, we employ an attention sub-block to make the model
adaptively focus on important channels/regions. The MSFE and attention
sub-blocks constitute our MSFAB. Then, we design a cross non-local block
(CNLB), which can capture long-range dependencies beyond the query. Instead of
the same input source of query branch, the key and value branches are enhanced
by fusing more preceding features. CNLB is computation-friendly by leveraging a
spatial pyramid down-sampling (SPDS) strategy to reduce the computation and
memory consumption without sacrificing the performance. Last but not least, a
novel detail-focused contrastive regularization (DFCR) is presented by
emphasizing the low-level details and ignoring the high-level semantic
information in the representation space. Comprehensive experimental results
demonstrate that the proposed MRFNLN model outperforms recent state-of-the-art
dehazing methods with less than 1.5 Million parameters.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing [25.016421338677816]
Current methods often process only two types of data, missing out on the rich information that additional modalities can provide.
We propose a novel textbfLightweight textbfMultimodal data textbfFusion textbfNetwork (LMFNet)
LMFNet accommodates various data types simultaneously, including RGB, NirRG, and DSM, through a weight-sharing, multi-branch vision transformer.
arXiv Detail & Related papers (2024-04-21T13:29:42Z) - A Lightweight Attention-based Deep Network via Multi-Scale Feature Fusion for Multi-View Facial Expression Recognition [2.9581436761331017]
We introduce a lightweight attentional network incorporating multi-scale feature fusion (LANMSFF) to tackle these issues.
We present two novel components, namely mass attention (MassAtt) and point wise feature selection (PWFS) blocks.
Our proposed approach achieved results comparable to state-of-the-art methods in terms of parameter counts and robustness to pose variation.
arXiv Detail & Related papers (2024-03-21T11:40:51Z) - Towards Compact 3D Representations via Point Feature Enhancement Masked
Autoencoders [52.66195794216989]
We propose Point Feature Enhancement Masked Autoencoders (Point-FEMAE) to learn compact 3D representations.
Point-FEMAE consists of a global branch and a local branch to capture latent semantic features.
Our method significantly improves the pre-training efficiency compared to cross-modal alternatives.
arXiv Detail & Related papers (2023-12-17T14:17:05Z) - MFPNet: Multi-scale Feature Propagation Network For Lightweight Semantic
Segmentation [5.58363644107113]
We propose a novel lightweight segmentation architecture, called Multi-scale Feature Propagation Network (Net)
We design a robust-Decoder structure featuring symmetrical residual blocks that consist of flexible bottleneck residual modules (BRMs)
Taking benefit of their capacity to model latent long-range contextual relationships, we leverage Graph Convolutional Networks (GCNs) to facilitate multiscale feature propagation between the BRM blocks.
arXiv Detail & Related papers (2023-09-10T02:02:29Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute
Recognition [23.814762073093153]
We propose a pure transformer-based multi-task PAR network named PARFormer, which includes four modules.
In the feature extraction module, we build a strong baseline for feature extraction, which achieves competitive results on several PAR benchmarks.
In the viewpoint perception module, we explore the impact of viewpoints on pedestrian attributes, and propose a multi-view contrastive loss.
In the attribute recognition module, we alleviate the negative-positive imbalance problem to generate the attribute predictions.
arXiv Detail & Related papers (2023-04-14T16:27:56Z) - Spatially-Adaptive Feature Modulation for Efficient Image
Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block.
Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - A^2-FPN: Attention Aggregation based Feature Pyramid Network for
Instance Segmentation [68.10621089649486]
We propose Attention Aggregation based Feature Pyramid Network (A2-FPN) to improve multi-scale feature learning.
A2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
arXiv Detail & Related papers (2021-05-07T11:51:08Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.