MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification
- URL: http://arxiv.org/abs/2512.03404v1
- Date: Wed, 03 Dec 2025 03:23:19 GMT
- Title: MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification
- Authors: Yujian Zhao, Hankun Liu, Guanglin Niu,
- Abstract summary: Cross-modal ship re-identification (ReID) between optical and synthetic aperture radar (SAR) imagery has emerged as a critical yet underexplored task in maritime intelligence and surveillance.<n>We propose MOS, a novel framework designed to mitigate the optical-SAR modality gap and achieve modality-consistent feature learning for optical-SAR cross-modal ship ReID.
- Score: 7.7794453452329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-modal ship re-identification (ReID) between optical and synthetic aperture radar (SAR) imagery has recently emerged as a critical yet underexplored task in maritime intelligence and surveillance. However, the substantial modality gap between optical and SAR images poses a major challenge for robust identification. To address this issue, we propose MOS, a novel framework designed to mitigate the optical-SAR modality gap and achieve modality-consistent feature learning for optical-SAR cross-modal ship ReID. MOS consists of two core components: (1) Modality-Consistent Representation Learning (MCRL) applies denoise SAR image procession and a class-wise modality alignment loss to align intra-identity feature distributions across modalities. (2) Cross-modal Data Generation and Feature fusion (CDGF) leverages a brownian bridge diffusion model to synthesize cross-modal samples, which are subsequently fused with original features during inference to enhance alignment and discriminability. Extensive experiments on the HOSS ReID dataset demonstrate that MOS significantly surpasses state-of-the-art methods across all evaluation protocols, achieving notable improvements of +3.0%, +6.2%, and +16.4% in R1 accuracy under the ALL to ALL, Optical to SAR, and SAR to Optical settings, respectively. The code and trained models will be released upon publication.
Related papers
- Semi-supervised Multiscale Matching for SAR-Optical Image [5.25009884148204]
We propose a semi-supervised multiscale matching for SAR-optical image matching (S2M2-SAR)<n>Specifically, we pseudo-label those unlabeled SAR-optical image pairs with pseudo ground-truth similarity heatmaps.<n>We also introduce a cross-modal feature enhancement module trained using a cross-modality mutual independence loss.
arXiv Detail & Related papers (2025-08-11T09:55:39Z) - Rotation Equivariant Arbitrary-scale Image Super-Resolution [62.41329042683779]
The arbitrary-scale image super-resolution (ASISR) aims to achieve arbitrary-scale high-resolution recoveries from a low-resolution input image.<n>We make efforts to construct a rotation equivariant ASISR method in this study.
arXiv Detail & Related papers (2025-08-07T08:51:03Z) - Decoupling Multi-Contrast Super-Resolution: Pairing Unpaired Synthesis with Implicit Representations [6.255537948555454]
Multi-Contrast Super-Resolution techniques can boost the quality of their low-resolution counterparts.<n>Existing MCSR methods often assume fixed resolution settings and all require large, perfectly paired training datasets.<n>We propose a novel Modular Multi-Contrast Super-Resolution framework that eliminates the need for paired training data and supports arbitrary upscaling.
arXiv Detail & Related papers (2025-05-09T07:48:52Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [92.4205087439928]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose the Self-supervised Transfer (PST) and the FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models, effectively mitigating data scarcity.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.<n>This combined approach enables FUSE to construct a universal image-event that only requires lightweight decoder adaptation for target datasets.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - M$^2$CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation [26.324664674025595]
In extreme scenarios such as disaster response, synthetic aperture radar (SAR) is more suitable for providing post-event data.<n>This introduces new challenges for CD methods, as existing weight-sharing Siamese networks struggle to learn the cross-modal data distribution.<n>We propose a unified MultiModal CD framework, M$2$CD, to address this challenge.
arXiv Detail & Related papers (2025-03-25T07:31:53Z) - DehazeMamba: SAR-guided Optical Remote Sensing Image Dehazing with Adaptive State Space Model [27.83437788159158]
We introduce DehazeMamba, a novel SAR-guided dehazing network built on a progressive haze decoupling fusion strategy.<n>Our approach incorporates two key innovations: a Haze Perception and Decoupling Module (HPDM) that dynamically identifies haze-affected regions through optical-SAR difference analysis, and a Progressive Fusion Module (PFM) that mitigates domain shift through a two-stage fusion process based on feature quality assessment.<n>Extensive experiments demonstrate that DehazeMamba significantly outperforms state-of-the-art methods, achieving a 0.73 dB improvement in PSNR and substantial enhancements in downstream tasks such as
arXiv Detail & Related papers (2025-03-17T11:25:05Z) - PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model [83.35198885088093]
PolSAR data presents unique challenges due to its rich and complex characteristics.<n>Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.<n>Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.<n>We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
arXiv Detail & Related papers (2024-12-17T09:59:53Z) - ECMamba: Consolidating Selective State Space Model with Retinex Guidance for Efficient Multiple Exposure Correction [48.77198487543991]
We introduce a novel framework based on Mamba for Exposure Correction (ECMamba) with dual pathways, each dedicated to the restoration of reflectance and illumination map.
Specifically, we derive the Retinex theory and we train a Retinex estimator capable of mapping inputs into two intermediary spaces.
We develop a novel 2D Selective State-space layer guided by Retinex information (Retinex-SS2D) as the core operator of ECMM.
arXiv Detail & Related papers (2024-10-28T21:02:46Z) - Conditional Brownian Bridge Diffusion Model for VHR SAR to Optical Image Translation [5.578820789388206]
This letter introduces a conditional image-to-image translation approach based on Brownian Bridge Diffusion Model (BBDM)<n>We conducted comprehensive experiments on the MSAW dataset, a paired SAR and optical images collection of 0.5m Very-High-Resolution (VHR)
arXiv Detail & Related papers (2024-08-15T05:43:46Z) - Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.