A Novel Self-Supervised Cross-Modal Image Retrieval Method In Remote
Sensing
- URL: http://arxiv.org/abs/2202.11429v1
- Date: Wed, 23 Feb 2022 11:20:24 GMT
- Title: A Novel Self-Supervised Cross-Modal Image Retrieval Method In Remote
Sensing
- Authors: Gencer Sumbul, Markus M\"uller, Beg\"um Demir
- Abstract summary: Cross-modal RS image retrieval methods search semantically similar images across different modalities.
Existing CM-RSIR methods require annotated training images and do not concurrently address intra- and inter-modal similarity preservation and inter-modal discrepancy elimination.
We introduce a novel self-supervised cross-modal image retrieval method that aims to model mutual-information between different modalities in a self-supervised manner.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the availability of multi-modal remote sensing (RS) image archives,
one of the most important research topics is the development of cross-modal RS
image retrieval (CM-RSIR) methods that search semantically similar images
across different modalities. Existing CM-RSIR methods require annotated
training images (which is time-consuming, costly and not feasible to gather in
large-scale applications) and do not concurrently address intra- and
inter-modal similarity preservation and inter-modal discrepancy elimination. In
this paper, we introduce a novel self-supervised cross-modal image retrieval
method that aims to: i) model mutual-information between different modalities
in a self-supervised manner; ii) retain the distributions of modal-specific
feature spaces similar; and iii) define most similar images within each
modality without requiring any annotated training images. To this end, we
propose a novel objective including three loss functions that simultaneously:
i) maximize mutual information of different modalities for inter-modal
similarity preservation; ii) minimize the angular distance of multi-modal image
tuples for the elimination of inter-modal discrepancies; and iii) increase
cosine similarity of most similar images within each modality for the
characterization of intra-modal similarities. Experimental results show the
effectiveness of the proposed method compared to state-of-the-art methods. The
code of the proposed method is publicly available at
https://git.tu-berlin.de/rsim/SS-CM-RSIR.
Related papers
- Cross-Modality Perturbation Synergy Attack for Person Re-identification [66.48494594909123]
The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities.
Existing attack methods have primarily focused on the characteristics of the visible image modality.
This study proposes a universal perturbation attack specifically designed for cross-modality ReID.
arXiv Detail & Related papers (2024-01-18T15:56:23Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Instance-Variant Loss with Gaussian RBF Kernel for 3D Cross-modal
Retriveal [52.41252219453429]
Existing methods treat all instances equally, applying the same penalty strength to instances with varying degrees of difficulty.
This can result in ambiguous convergence or local optima, severely compromising the separability of the feature space.
We propose an Instance-Variant loss to assign different penalty strengths to different instances, improving the space separability.
arXiv Detail & Related papers (2023-05-07T10:12:14Z) - Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image
Representations [3.3754780158324564]
Cross-modality image retrieval is challenging, since images of similar (or even the same) content captured by different modalities might share few common structures.
We propose a new application-independent content-based image retrieval system for reverse (sub-)image search across modalities.
arXiv Detail & Related papers (2022-01-10T19:04:28Z) - Multi-Modal Mutual Information Maximization: A Novel Approach for
Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH)
We learn informative representations that can preserve both intra- and inter-modal similarities.
The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z) - Cross-modal Image Retrieval with Deep Mutual Information Maximization [14.778158582349137]
We study the cross-modal image retrieval, where the inputs contain a source image plus some text that describes certain modifications to this image and the desired image.
Our method narrows the modality gap between the text modality and the image modality by maximizing mutual information between their not exactly semantically identical representation.
arXiv Detail & Related papers (2021-03-10T13:08:09Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z) - A Similarity Inference Metric for RGB-Infrared Cross-Modality Person
Re-identification [66.49212581685127]
Cross-modality person re-identification (re-ID) is a challenging task due to the large discrepancy between IR and RGB modalities.
Existing methods address this challenge typically by aligning feature distributions or image styles across modalities.
This paper presents a novel similarity inference metric (SIM) that exploits the intra-modality sample similarities to circumvent the cross-modality discrepancy.
arXiv Detail & Related papers (2020-07-03T05:28:13Z) - CoMIR: Contrastive Multimodal Image Representation for Registration [4.543268895439618]
We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations)
CoMIRs enable the registration of multimodal images where existing registration methods often fail due to a lack of sufficiently similar image structures.
arXiv Detail & Related papers (2020-06-11T10:51:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.