Complementary Random Masking for RGB-Thermal Semantic Segmentation
- URL: http://arxiv.org/abs/2303.17386v2
- Date: Mon, 4 Mar 2024 18:06:33 GMT
- Title: Complementary Random Masking for RGB-Thermal Semantic Segmentation
- Authors: Ukcheol Shin, Kyunghyun Lee, In So Kweon, Jean Oh
- Abstract summary: RGB-thermal semantic segmentation is a potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions.
This paper proposes 1) a complementary random masking strategy of RGB-T images and 2) self-distillation loss between clean and masked input modalities.
We achieve state-of-the-art performance over three RGB-T semantic segmentation benchmarks.
- Score: 63.93784265195356
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: RGB-thermal semantic segmentation is one potential solution to achieve
reliable semantic scene understanding in adverse weather and lighting
conditions. However, the previous studies mostly focus on designing a
multi-modal fusion module without consideration of the nature of multi-modality
inputs. Therefore, the networks easily become over-reliant on a single
modality, making it difficult to learn complementary and meaningful
representations for each modality. This paper proposes 1) a complementary
random masking strategy of RGB-T images and 2) self-distillation loss between
clean and masked input modalities. The proposed masking strategy prevents
over-reliance on a single modality. It also improves the accuracy and
robustness of the neural network by forcing the network to segment and classify
objects even when one modality is partially available. Also, the proposed
self-distillation loss encourages the network to extract complementary and
meaningful representations from a single modality or complementary masked
modalities. Based on the proposed method, we achieve state-of-the-art
performance over three RGB-T semantic segmentation benchmarks. Our source code
is available at https://github.com/UkcheolShin/CRM_RGBTSeg.
Related papers
- Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation [54.96563068182733]
We propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task.
MADM utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities.
We show that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities.
arXiv Detail & Related papers (2024-10-29T03:49:40Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - SpiderMesh: Spatial-aware Demand-guided Recursive Meshing for RGB-T
Semantic Segmentation [13.125707028339292]
We propose a Spatial-aware Demand-guided Recursive Meshing (SpiderMesh) framework for practical RGB-T (thermal) segmentation.
SpiderMesh proactively compensates inadequate contextual semantics in optically-impaired regions.
Experiments on MFNet and PST900 datasets demonstrate that SpiderMesh achieves state-of-the-art performance on standard RGB-T segmentation benchmarks.
arXiv Detail & Related papers (2023-03-15T15:24:01Z) - Adaptive Convolutional Dictionary Network for CT Metal Artifact
Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction.
Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image.
Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z) - Multi-Scale Cascading Network with Compact Feature Learning for
RGB-Infrared Person Re-Identification [35.55895776505113]
Multi-Scale Part-Aware Cascading framework (MSPAC) is formulated by aggregating multi-scale fine-grained features from part to global.
Cross-modality correlations can thus be efficiently explored on salient features for distinctive modality-invariant feature learning.
arXiv Detail & Related papers (2020-12-12T15:39:11Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.