Contrastive Learning-Based Spectral Knowledge Distillation for
Multi-Modality and Missing Modality Scenarios in Semantic Segmentation
- URL: http://arxiv.org/abs/2312.02240v1
- Date: Mon, 4 Dec 2023 10:27:09 GMT
- Title: Contrastive Learning-Based Spectral Knowledge Distillation for
Multi-Modality and Missing Modality Scenarios in Semantic Segmentation
- Authors: Aniruddh Sikdar, Jayant Teotia, Suresh Sundaram
- Abstract summary: novel multi-modal fusion approach called CSK-Net is proposed.
It uses a contrastive learning-based spectral knowledge distillation technique.
Experiments show that CSK-Net surpasses state-of-the-art models in multi-modal tasks and for missing modalities.
- Score: 2.491548070992611
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Improving the performance of semantic segmentation models using multispectral
information is crucial, especially for environments with low-light and adverse
conditions. Multi-modal fusion techniques pursue either the learning of
cross-modality features to generate a fused image or engage in knowledge
distillation but address multimodal and missing modality scenarios as distinct
issues, which is not an optimal approach for multi-sensor models. To address
this, a novel multi-modal fusion approach called CSK-Net is proposed, which
uses a contrastive learning-based spectral knowledge distillation technique
along with an automatic mixed feature exchange mechanism for semantic
segmentation in optical (EO) and infrared (IR) images. The distillation scheme
extracts detailed textures from the optical images and distills them into the
optical branch of CSK-Net. The model encoder consists of shared convolution
weights with separate batch norm (BN) layers for both modalities, to capture
the multi-spectral information from different modalities of the same objects. A
Novel Gated Spectral Unit (GSU) and mixed feature exchange strategy are
proposed to increase the correlation of modality-shared information and
decrease the modality-specific information during the distillation process.
Comprehensive experiments show that CSK-Net surpasses state-of-the-art models
in multi-modal tasks and for missing modalities when exclusively utilizing IR
data for inference across three public benchmarking datasets. For missing
modality scenarios, the performance increase is achieved without additional
computational costs compared to the baseline segmentation models.
Related papers
- Multiscale Color Guided Attention Ensemble Classifier for Age-Related Macular Degeneration using Concurrent Fundus and Optical Coherence Tomography Images [1.159256777373941]
This paper proposes a modality-specific multiscale color space embedding integrated with the attention mechanism based on transfer learning for classification.
To analyze the performance of the proposed MCGAEc method, a publicly available multi-modality dataset from Project Macula for AMD is utilized and compared with the existing models.
arXiv Detail & Related papers (2024-09-01T13:17:45Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Learning transformer-based heterogeneously salient graph representation for multimodal remote sensing image classification [42.15709954199397]
A transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper.
First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data.
A self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling.
arXiv Detail & Related papers (2023-11-17T04:06:20Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Fusion of Infrared and Visible Images based on Spatial-Channel
Attentional Mechanism [3.388001684915793]
We present AMFusionNet, an innovative approach to infrared and visible image fusion (IVIF)
By assimilating thermal details from infrared images with texture features from visible sources, our method produces images enriched with comprehensive information.
Our method outperforms state-of-the-art algorithms in terms of quality and quantity.
arXiv Detail & Related papers (2023-08-25T21:05:11Z) - RetiFluidNet: A Self-Adaptive and Multi-Attention Deep Convolutional
Network for Retinal OCT Fluid Segmentation [3.57686754209902]
Quantification of retinal fluids is necessary for OCT-guided treatment management.
New convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation.
Model benefits from hierarchical representation learning of textural, contextual, and edge features.
arXiv Detail & Related papers (2022-09-26T07:18:00Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Image-specific Convolutional Kernel Modulation for Single Image
Super-resolution [85.09413241502209]
In this issue, we propose a novel image-specific convolutional modulation kernel (IKM)
We exploit the global contextual information of image or feature to generate an attention weight for adaptively modulating the convolutional kernels.
Experiments on single image super-resolution show that the proposed methods achieve superior performances over state-of-the-art methods.
arXiv Detail & Related papers (2021-11-16T11:05:10Z) - Contrastive Multiview Coding with Electro-optics for SAR Semantic
Segmentation [0.6445605125467573]
We propose multi-modal representation learning for SAR semantic segmentation.
Unlike previous studies, our method jointly uses EO imagery, SAR imagery, and a label mask.
Several experiments show that our approach is superior to the existing methods in model performance, sample efficiency, and convergence speed.
arXiv Detail & Related papers (2021-08-31T23:55:41Z) - Attention Bottlenecks for Multimodal Fusion [90.75885715478054]
Machine perception models are typically modality-specific and optimised for unimodal benchmarks.
We introduce a novel transformer based architecture that uses fusion' for modality fusion at multiple layers.
We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks.
arXiv Detail & Related papers (2021-06-30T22:44:12Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.