A Novel Self-Supervised Cross-Modal Image Retrieval Method In Remote
Sensing
- URL: http://arxiv.org/abs/2202.11429v1
- Date: Wed, 23 Feb 2022 11:20:24 GMT
- Title: A Novel Self-Supervised Cross-Modal Image Retrieval Method In Remote
Sensing
- Authors: Gencer Sumbul, Markus M\"uller, Beg\"um Demir
- Abstract summary: Cross-modal RS image retrieval methods search semantically similar images across different modalities.
Existing CM-RSIR methods require annotated training images and do not concurrently address intra- and inter-modal similarity preservation and inter-modal discrepancy elimination.
We introduce a novel self-supervised cross-modal image retrieval method that aims to model mutual-information between different modalities in a self-supervised manner.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the availability of multi-modal remote sensing (RS) image archives,
one of the most important research topics is the development of cross-modal RS
image retrieval (CM-RSIR) methods that search semantically similar images
across different modalities. Existing CM-RSIR methods require annotated
training images (which is time-consuming, costly and not feasible to gather in
large-scale applications) and do not concurrently address intra- and
inter-modal similarity preservation and inter-modal discrepancy elimination. In
this paper, we introduce a novel self-supervised cross-modal image retrieval
method that aims to: i) model mutual-information between different modalities
in a self-supervised manner; ii) retain the distributions of modal-specific
feature spaces similar; and iii) define most similar images within each
modality without requiring any annotated training images. To this end, we
propose a novel objective including three loss functions that simultaneously:
i) maximize mutual information of different modalities for inter-modal
similarity preservation; ii) minimize the angular distance of multi-modal image
tuples for the elimination of inter-modal discrepancies; and iii) increase
cosine similarity of most similar images within each modality for the
characterization of intra-modal similarities. Experimental results show the
effectiveness of the proposed method compared to state-of-the-art methods. The
code of the proposed method is publicly available at
https://git.tu-berlin.de/rsim/SS-CM-RSIR.
Related papers
- From Cross-Modal to Mixed-Modal Visible-Infrared Re-Identification [11.324518300593983]
Current VI-ReID methods focus on cross-modality matching, but real-world applications often involve mixed galleries containing both V and I images.
This is because gallery images from the same modality may have lower domain gaps but correspond to different identities.
This paper introduces a novel mixed-modal ReID setting, where galleries contain data from both modalities.
arXiv Detail & Related papers (2025-01-23T01:28:05Z) - MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching [54.740256498985026]
Keypoint detection and description methods often struggle with multimodal data.
We propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching.
arXiv Detail & Related papers (2025-01-20T06:56:30Z) - MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training [62.843316348659165]
Deep learning-based image matching algorithms have dramatically outperformed humans in rapidly and accurately finding large amounts of correspondences.
We propose a large-scale pre-training framework that utilizes synthetic cross-modal training signals to train models to recognize and match fundamental structures across images.
Our key finding is that the matching model trained with our framework achieves remarkable generalizability across more than eight unseen cross-modality registration tasks.
arXiv Detail & Related papers (2025-01-13T18:37:36Z) - Cross-Modality Perturbation Synergy Attack for Person Re-identification [66.48494594909123]
The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities.
Existing attack methods have primarily focused on the characteristics of the visible image modality.
This study proposes a universal perturbation attack specifically designed for cross-modality ReID.
arXiv Detail & Related papers (2024-01-18T15:56:23Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image
Representations [3.3754780158324564]
Cross-modality image retrieval is challenging, since images of similar (or even the same) content captured by different modalities might share few common structures.
We propose a new application-independent content-based image retrieval system for reverse (sub-)image search across modalities.
arXiv Detail & Related papers (2022-01-10T19:04:28Z) - Multi-Modal Mutual Information Maximization: A Novel Approach for
Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH)
We learn informative representations that can preserve both intra- and inter-modal similarities.
The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z) - Cross-modal Image Retrieval with Deep Mutual Information Maximization [14.778158582349137]
We study the cross-modal image retrieval, where the inputs contain a source image plus some text that describes certain modifications to this image and the desired image.
Our method narrows the modality gap between the text modality and the image modality by maximizing mutual information between their not exactly semantically identical representation.
arXiv Detail & Related papers (2021-03-10T13:08:09Z) - A Similarity Inference Metric for RGB-Infrared Cross-Modality Person
Re-identification [66.49212581685127]
Cross-modality person re-identification (re-ID) is a challenging task due to the large discrepancy between IR and RGB modalities.
Existing methods address this challenge typically by aligning feature distributions or image styles across modalities.
This paper presents a novel similarity inference metric (SIM) that exploits the intra-modality sample similarities to circumvent the cross-modality discrepancy.
arXiv Detail & Related papers (2020-07-03T05:28:13Z) - CoMIR: Contrastive Multimodal Image Representation for Registration [4.543268895439618]
We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations)
CoMIRs enable the registration of multimodal images where existing registration methods often fail due to a lack of sufficiently similar image structures.
arXiv Detail & Related papers (2020-06-11T10:51:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.