Edge-aware Guidance Fusion Network for RGB Thermal Scene Parsing
- URL: http://arxiv.org/abs/2112.05144v1
- Date: Thu, 9 Dec 2021 01:12:47 GMT
- Title: Edge-aware Guidance Fusion Network for RGB Thermal Scene Parsing
- Authors: Wujie Zhou, Shaohua Dong, Caie Xu, Yaguan Qian
- Abstract summary: We propose an edge-aware guidance fusion network (EGFNet) for RGB thermal scene parsing.
To effectively fuse the RGB and thermal information, we propose a multimodal fusion module.
Considering the importance of high level semantic information, we propose a global information module and a semantic information module.
- Score: 4.913013713982677
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: RGB thermal scene parsing has recently attracted increasing research interest
in the field of computer vision. However, most existing methods fail to perform
good boundary extraction for prediction maps and cannot fully use high level
features. In addition, these methods simply fuse the features from RGB and
thermal modalities but are unable to obtain comprehensive fused features. To
address these problems, we propose an edge-aware guidance fusion network
(EGFNet) for RGB thermal scene parsing. First, we introduce a prior edge map
generated using the RGB and thermal images to capture detailed information in
the prediction map and then embed the prior edge information in the feature
maps. To effectively fuse the RGB and thermal information, we propose a
multimodal fusion module that guarantees adequate cross-modal fusion.
Considering the importance of high level semantic information, we propose a
global information module and a semantic information module to extract rich
semantic information from the high-level features. For decoding, we use simple
elementwise addition for cascaded feature fusion. Finally, to improve the
parsing accuracy, we apply multitask deep supervision to the semantic and
boundary maps. Extensive experiments were performed on benchmark datasets to
demonstrate the effectiveness of the proposed EGFNet and its superior
performance compared with state of the art methods. The code and results can be
found at https://github.com/ShaohuaDong2021/EGFNet.
Related papers
- HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion [15.538174593176166]
In this study, we explore a feasible strategy to fully exploit VFM features for RGB-thermal scene parsing.
Specifically, we design a hybrid, asymmetric encoder that incorporates both a VFM and a convolutional neural network.
This design allows for more effective extraction of complementary heterogeneous features, which are subsequently fused in a dual-path, progressive manner.
arXiv Detail & Related papers (2024-04-04T15:31:11Z) - Attentive Multimodal Fusion for Optical and Scene Flow [24.08052492109655]
Existing methods typically rely solely on RGB images or fuse the modalities at later stages.
We propose a novel deep neural network approach named FusionRAFT, which enables early-stage information fusion between sensor modalities.
Our approach exhibits improved robustness in the presence of noise and low-lighting conditions that affect the RGB images.
arXiv Detail & Related papers (2023-07-28T04:36:07Z) - HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection [4.007827908611563]
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
arXiv Detail & Related papers (2023-07-03T11:56:21Z) - Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks [13.742299383836256]
We propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data.
The proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting.
arXiv Detail & Related papers (2023-03-28T03:37:27Z) - Spherical Space Feature Decomposition for Guided Depth Map
Super-Resolution [123.04455334124188]
Guided depth map super-resolution (GDSR) aims to upsample low-resolution (LR) depth maps with additional information involved in high-resolution (HR) RGB images from the same scene.
In this paper, we propose the Spherical Space feature Decomposition Network (SSDNet) to solve the above issues.
Our method can achieve state-of-the-art results on four test datasets, as well as successfully generalize to real-world scenes.
arXiv Detail & Related papers (2023-03-15T21:22:21Z) - Interactive Context-Aware Network for RGB-T Salient Object Detection [7.544240329265388]
We propose a novel network called Interactive Context-Aware Network (ICANet)
ICANet contains three modules that can effectively perform the cross-modal and cross-scale fusions.
Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
arXiv Detail & Related papers (2022-11-11T10:04:36Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Cross-Modal Weighting Network for RGB-D Salient Object Detection [76.0965123893641]
We propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD.
Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion.
CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.
arXiv Detail & Related papers (2020-07-09T16:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.