M$^2$CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation
- URL: http://arxiv.org/abs/2503.19406v1
- Date: Tue, 25 Mar 2025 07:31:53 GMT
- Title: M$^2$CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation
- Authors: Ziyuan Liu, Jiawei Zhang, Wenyu Wang, Yuantao Gu,
- Abstract summary: In extreme scenarios such as disaster response, synthetic aperture radar (SAR) is more suitable for providing post-event data.<n>This introduces new challenges for CD methods, as existing weight-sharing Siamese networks struggle to learn the cross-modal data distribution.<n>We propose a unified MultiModal CD framework, M$2$CD, to address this challenge.
- Score: 26.324664674025595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing change detection (CD) methods focus on optical images captured at different times, and deep learning (DL) has achieved remarkable success in this domain. However, in extreme scenarios such as disaster response, synthetic aperture radar (SAR), with its active imaging capability, is more suitable for providing post-event data. This introduces new challenges for CD methods, as existing weight-sharing Siamese networks struggle to effectively learn the cross-modal data distribution between optical and SAR images. To address this challenge, we propose a unified MultiModal CD framework, M$^2$CD. We integrate Mixture of Experts (MoE) modules into the backbone to explicitly handle diverse modalities, thereby enhancing the model's ability to learn multimodal data distributions. Additionally, we innovatively propose an Optical-to-SAR guided path (O2SP) and implement self-distillation during training to reduce the feature space discrepancy between different modalities, further alleviating the model's learning burden. We design multiple variants of M$^2$CD based on both CNN and Transformer backbones. Extensive experiments validate the effectiveness of the proposed framework, with the MiT-b1 version of M$^2$CD outperforming all state-of-the-art (SOTA) methods in optical-SAR CD tasks.
Related papers
- MICON-Bench: Benchmarking and Enhancing Multi-Image Context Image Generation in Unified Multimodal Models [89.89575486159795]
We introduce textbfMICON-Bench, a benchmark for multi-image context generation.<n>We propose an MLLM-driven Evaluation-by-Checkpoint framework for automatic verification of semantic and visual consistency.<n>We also present textbfDynamic Attention Rebalancing (DAR), a training-free, plug-and-play mechanism that dynamically adjusts attention during inference to enhance coherence and reduce hallucinations.
arXiv Detail & Related papers (2026-02-23T04:32:52Z) - MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification [7.7794453452329]
Cross-modal ship re-identification (ReID) between optical and synthetic aperture radar (SAR) imagery has emerged as a critical yet underexplored task in maritime intelligence and surveillance.<n>We propose MOS, a novel framework designed to mitigate the optical-SAR modality gap and achieve modality-consistent feature learning for optical-SAR cross-modal ship ReID.
arXiv Detail & Related papers (2025-12-03T03:23:19Z) - S2C: Learning Noise-Resistant Differences for Unsupervised Change Detection in Multimodal Remote Sensing Images [24.75086641416994]
Untemporal Change Detection (UCD) in multimodal Remote Sensing (RS) images remains a difficult challenge.<n>Inspired by recent advancements in Visual Foundation Models (VFMs) and Contrastive Learning (CL) methodologies, this research aims to develop CL methodologies to translate implicit knowledge in representations into change.
arXiv Detail & Related papers (2025-02-18T07:34:54Z) - MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching [54.740256498985026]
Keypoint detection and description methods often struggle with multimodal data.<n>We propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching.
arXiv Detail & Related papers (2025-01-20T06:56:30Z) - Large Language Models for Multimodal Deformable Image Registration [50.91473745610945]
We propose a novel coarse-to-fine MDIR framework,LLM-Morph, for aligning the deep features from different modal medical images.
Specifically, we first utilize a CNN encoder to extract deep visual features from cross-modal image pairs, then we use the first adapter to adjust these tokens, and use LoRA in pre-trained LLMs to fine-tune their weights.
Third, for the alignment of tokens, we utilize other four adapters to transform the LLM-encoded tokens into multi-scale visual features, generating multi-scale deformation fields and facilitating the coarse-to-fine MDIR task
arXiv Detail & Related papers (2024-08-20T09:58:30Z) - Cross-Domain Separable Translation Network for Multimodal Image Change Detection [11.25422609271201]
multimodal change detection (MCD) is particularly critical in the remote sensing community.
This paper focuses on addressing the challenges of MCD, especially the difficulty in comparing images from different sensors.
A novel unsupervised cross-domain separable translation network (CSTN) is proposed to overcome these limitations.
arXiv Detail & Related papers (2024-07-23T03:56:02Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution [31.0941272076536]
Blind image Super-Resolution (BSR) methods focus on estimating either kernel or degradation information, but have long overlooked the essential content details.
We propose a novel BSR approach, Content-aware Degradation-driven Transformer (CDFormer), to capture both degradation and content representations.
arXiv Detail & Related papers (2024-05-13T11:13:17Z) - Segment Change Model (SCM) for Unsupervised Change detection in VHR Remote Sensing Images: a Case Study of Buildings [24.520190873711766]
We propose an unsupervised Change Detection (CD) method named Segment Change Model (SCM)
Our method recalibrates features extracted at different scales and integrates them in a top-down manner to enhance discriminative change edges.
arXiv Detail & Related papers (2023-12-27T04:47:03Z) - Contrastive Learning-Based Spectral Knowledge Distillation for
Multi-Modality and Missing Modality Scenarios in Semantic Segmentation [2.491548070992611]
novel multi-modal fusion approach called CSK-Net is proposed.
It uses a contrastive learning-based spectral knowledge distillation technique.
Experiments show that CSK-Net surpasses state-of-the-art models in multi-modal tasks and for missing modalities.
arXiv Detail & Related papers (2023-12-04T10:27:09Z) - Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for
Loss-free Multi-Exposure Image Fusion [60.221404321514086]
Multi-exposure image fusion (MEF) has emerged as a prominent solution to address the limitations of digital imaging in representing varied exposure levels.
This paper presents a Hybrid-Supervised Dual-Search approach for MEF, dubbed HSDS-MEF, which introduces a bi-level optimization search scheme for automatic design of both network structures and loss functions.
arXiv Detail & Related papers (2023-09-03T08:07:26Z) - GCD-DDPM: A Generative Change Detection Model Based on
Difference-Feature Guided DDPM [7.922421805234563]
Deep learning methods have recently shown great promise in bitemporal change detection (CD)
This work proposes a generative change detection model called GCD-DDPM to directly generate CD maps.
Experiments on four high-resolution CD datasets confirm the superior performance of the proposed GCD-DDPM.
arXiv Detail & Related papers (2023-06-06T05:51:50Z) - Multi-scale Transformer Network with Edge-aware Pre-training for
Cross-Modality MR Image Synthesis [52.41439725865149]
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones.
Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model.
We propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis.
arXiv Detail & Related papers (2022-12-02T11:40:40Z) - Revisiting Consistency Regularization for Semi-supervised Change
Detection in Remote Sensing Images [60.89777029184023]
We propose a semi-supervised CD model in which we formulate an unsupervised CD loss in addition to the supervised Cross-Entropy (CE) loss.
Experiments conducted on two publicly available CD datasets show that the proposed semi-supervised CD method can reach closer to the performance of supervised CD.
arXiv Detail & Related papers (2022-04-18T17:59:01Z) - Image-specific Convolutional Kernel Modulation for Single Image
Super-resolution [85.09413241502209]
In this issue, we propose a novel image-specific convolutional modulation kernel (IKM)
We exploit the global contextual information of image or feature to generate an attention weight for adaptively modulating the convolutional kernels.
Experiments on single image super-resolution show that the proposed methods achieve superior performances over state-of-the-art methods.
arXiv Detail & Related papers (2021-11-16T11:05:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.