DooDLeNet: Double DeepLab Enhanced Feature Fusion for Thermal-color
Semantic Segmentation
- URL: http://arxiv.org/abs/2204.10266v1
- Date: Thu, 21 Apr 2022 17:06:57 GMT
- Title: DooDLeNet: Double DeepLab Enhanced Feature Fusion for Thermal-color
Semantic Segmentation
- Authors: Oriel Frigo, Lucien Martin-Gaff\'e, Catherine Wacongne
- Abstract summary: We propose DooDLeNet, a double DeepLab architecture with specialized encoder-decoders for thermal and color modalities.
We combine two strategies for feature fusion: confidence weighting and correlation weighting.
We report state-of-the-art mean IoU results on the MF dataset.
- Score: 1.6758573326215689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we present a new approach for feature fusion between RGB and
LWIR Thermal images for the task of semantic segmentation for driving
perception. We propose DooDLeNet, a double DeepLab architecture with
specialized encoder-decoders for thermal and color modalities and a shared
decoder for final segmentation. We combine two strategies for feature fusion:
confidence weighting and correlation weighting. We report state-of-the-art mean
IoU results on the MF dataset.
Related papers
- SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval [82.51117533271517]
Previous works typically only encode RGB videos to obtain high-level semantic features.
Existing RGB-based sign retrieval works suffer from the huge memory cost of dense visual data embedding in end-to-end training.
We propose a novel sign language representation framework called Semantically Enhanced Dual-Stream.
arXiv Detail & Related papers (2024-07-23T11:31:11Z) - Optimizing rgb-d semantic segmentation through multi-modal interaction
and pooling attention [5.518612382697244]
Multi-modal Interaction and Pooling Attention Network (MIPANet) is designed to harness the interactive synergy between RGB and depth modalities.
We introduce a Pooling Attention Module (PAM) at various stages of the encoder.
This module serves to amplify the features extracted by the network and integrates the module's output into the decoder.
arXiv Detail & Related papers (2023-11-19T12:25:59Z) - Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation [16.721758280029302]
We propose a novel attention-based dual supervised decoder for RGBD semantic segmentation.
In the encoder, we design a simple yet effective attention-based multimodal fusion module to extract and fuse deeply multi-level paired complementary information.
Our method achieves superior performance against the state-of-the-art methods.
arXiv Detail & Related papers (2022-01-05T03:12:27Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Deep ensembles based on Stochastic Activation Selection for Polyp
Segmentation [82.61182037130406]
This work deals with medical image segmentation and in particular with accurate polyp detection and segmentation during colonoscopy examinations.
Basic architecture in image segmentation consists of an encoder and a decoder.
We compare some variant of the DeepLab architecture obtained by varying the decoder backbone.
arXiv Detail & Related papers (2021-04-02T02:07:37Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Global-Local Propagation Network for RGB-D Semantic Segmentation [12.710923449138434]
We propose Global-Local propagation network (GLPNet) to solve this problem.
Our GLPNet achieves new state-of-the-art performance on two challenging indoor scene segmentation datasets.
arXiv Detail & Related papers (2021-01-26T14:26:07Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.