Unsupervised Contrastive Hashing for Cross-Modal Retrieval in Remote
Sensing
- URL: http://arxiv.org/abs/2204.08707v1
- Date: Tue, 19 Apr 2022 07:25:25 GMT
- Title: Unsupervised Contrastive Hashing for Cross-Modal Retrieval in Remote
Sensing
- Authors: Georgii Mikriukov, Mahdyar Ravanbakhsh, Beg\"um Demir
- Abstract summary: Cross-modal text-image retrieval has attracted great attention in remote sensing.
We introduce a novel unsupervised cross-modal contrastive hashing (DUCH) method for text-image retrieval in RS.
Experimental results show that the proposed DUCH outperforms state-of-the-art methods.
- Score: 1.6758573326215689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The development of cross-modal retrieval systems that can search and retrieve
semantically relevant data across different modalities based on a query in any
modality has attracted great attention in remote sensing (RS). In this paper,
we focus our attention on cross-modal text-image retrieval, where queries from
one modality (e.g., text) can be matched to archive entries from another (e.g.,
image). Most of the existing cross-modal text-image retrieval systems in RS
require a high number of labeled training samples and also do not allow fast
and memory-efficient retrieval. These issues limit the applicability of the
existing cross-modal retrieval systems for large-scale applications in RS. To
address this problem, in this paper we introduce a novel unsupervised
cross-modal contrastive hashing (DUCH) method for text-image retrieval in RS.
To this end, the proposed DUCH is made up of two main modules: 1) feature
extraction module, which extracts deep representations of two modalities; 2)
hashing module that learns to generate cross-modal binary hash codes from the
extracted representations. We introduce a novel multi-objective loss function
including: i) contrastive objectives that enable similarity preservation in
intra- and inter-modal similarities; ii) an adversarial objective that is
enforced across two modalities for cross-modal representation consistency; and
iii) binarization objectives for generating hash codes. Experimental results
show that the proposed DUCH outperforms state-of-the-art methods. Our code is
publicly available at https://git.tu-berlin.de/rsim/duch.
Related papers
- ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval.
ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - End-to-end Knowledge Retrieval with Multi-modal Queries [50.01264794081951]
ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries.
We introduce a retriever model ReViz'' that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion.
We demonstrate superior performance in retrieval on two datasets under zero-shot settings.
arXiv Detail & Related papers (2023-06-01T08:04:12Z) - EDIS: Entity-Driven Image Search over Multimodal Web Content [95.40238328527931]
We introduce textbfEntity-textbfDriven textbfImage textbfSearch (EDIS), a dataset for cross-modal image search in the news domain.
EDIS consists of 1 million web images from actual search engine results and curated datasets, with each image paired with a textual description.
arXiv Detail & Related papers (2023-05-23T02:59:19Z) - Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote
Sensing Image Retrieval [21.05804942940532]
Cross-modal text-image retrieval has attracted extensive attention for its advantages of flexible input and efficient query.
To cope with the problem of multi-scale scarcity and target redundancy in RS multimodal retrieval task, we come up with a novel asymmetric multimodal feature matching network (AMFMN)
Our model adapts to multi-scale feature inputs, favors multi-source retrieval methods, and can dynamically filter redundant features.
arXiv Detail & Related papers (2022-04-21T03:53:19Z) - An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training
Image-Text Correspondences in Remote Sensing [1.6758573326215689]
Cross-modal image-text retrieval methods have attracted great attention in remote sensing.
Most of the existing methods assume that a reliable multi-modal training set with accurately matched text-image pairs is existing.
We propose a novel unsupervised cross-modal hashing method robust to the noisy image-text correspondences (CHNR)
Experimental results show that the proposed CHNR outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-02-26T11:22:24Z) - Efficient Cross-Modal Retrieval via Deep Binary Hashing and Quantization [5.799838997511804]
Cross-modal retrieval aims to search for data with similar semantic meanings across different content modalities.
We propose a jointly learned deep hashing and quantization network (HQ) for cross-modal retrieval.
Experimental results on the NUS-WIDE, MIR-Flickr, and Amazon datasets demonstrate that HQ achieves boosts of more than 7% in precision.
arXiv Detail & Related papers (2022-02-15T22:00:04Z) - Deep Unsupervised Contrastive Hashing for Large-Scale Cross-Modal
Text-Image Retrieval in Remote Sensing [1.6758573326215689]
We introduce a novel deep unsupervised cross-modal contrastive hashing (DUCH) method for RS text-image retrieval.
Experimental results show that the proposed DUCH outperforms state-of-the-art unsupervised cross-modal hashing methods.
Our code is publicly available at https://git.tu-berlin.de/rsim/duch.
arXiv Detail & Related papers (2022-01-20T12:05:10Z) - Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for
Improved Cross-Modal Retrieval [80.35589927511667]
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.
We propose a novel fine-tuning framework which turns any pretrained text-image multi-modal model into an efficient retrieval model.
Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
arXiv Detail & Related papers (2021-03-22T15:08:06Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning.
We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations.
We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z) - Task-adaptive Asymmetric Deep Cross-modal Hashing [20.399984971442]
Cross-modal hashing aims to embed semantic correlations of heterogeneous modality data into the binary hash codes with discriminative semantic labels.
We present a Task-adaptive Asymmetric Deep Cross-modal Hashing (TA-ADCMH) method in this paper.
It can learn task-adaptive hash functions for two sub-retrieval tasks via simultaneous modality representation and asymmetric hash learning.
arXiv Detail & Related papers (2020-04-01T02:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.