Related papers: Efficient Cross-Modal Retrieval via Deep Binary Hashing and Quantization

Efficient Cross-Modal Retrieval via Deep Binary Hashing and Quantization

URL: http://arxiv.org/abs/2202.10232v1
Date: Tue, 15 Feb 2022 22:00:04 GMT
Title: Efficient Cross-Modal Retrieval via Deep Binary Hashing and Quantization
Authors: Yang Shi, Young-joo Chung
Abstract summary: Cross-modal retrieval aims to search for data with similar semantic meanings across different content modalities. We propose a jointly learned deep hashing and quantization network (HQ) for cross-modal retrieval. Experimental results on the NUS-WIDE, MIR-Flickr, and Amazon datasets demonstrate that HQ achieves boosts of more than 7% in precision.
Score: 5.799838997511804
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-modal retrieval aims to search for data with similar semantic meanings across different content modalities. However, cross-modal retrieval requires huge amounts of storage and retrieval time since it needs to process data in multiple modalities. Existing works focused on learning single-source compact features such as binary hash codes that preserve similarities between different modalities. In this work, we propose a jointly learned deep hashing and quantization network (HQ) for cross-modal retrieval. We simultaneously learn binary hash codes and quantization codes to preserve semantic information in multiple modalities by an end-to-end deep learning architecture. At the retrieval step, binary hashing is used to retrieve a subset of items from the search space, then quantization is used to re-rank the retrieved items. We theoretically and empirically show that this two-stage retrieval approach provides faster retrieval results while preserving accuracy. Experimental results on the NUS-WIDE, MIR-Flickr, and Amazon datasets demonstrate that HQ achieves boosts of more than 7% in precision compared to supervised neural network-based compact coding models.

Related papers

CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval [103.116634967815]
We introduce CodeXEmbed, a family of large-scale code embedding models ranging from 400M to 7B parameters. Our novel training pipeline unifies multiple programming languages and transforms various code-related tasks into a common retrieval framework. Our 7B model sets a new state-of-the-art (SOTA) in code retrieval, outperforming the previous leading model, Voyage-Code, by over 20% on CoIR benchmark.
arXiv Detail & Related papers (2024-11-19T16:54:45Z)
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning. The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms. Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z)
Unsupervised Contrastive Hashing for Cross-Modal Retrieval in Remote Sensing [1.6758573326215689]
Cross-modal text-image retrieval has attracted great attention in remote sensing. We introduce a novel unsupervised cross-modal contrastive hashing (DUCH) method for text-image retrieval in RS. Experimental results show that the proposed DUCH outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-04-19T07:25:25Z)
Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z)
Deep Unsupervised Contrastive Hashing for Large-Scale Cross-Modal Text-Image Retrieval in Remote Sensing [1.6758573326215689]
We introduce a novel deep unsupervised cross-modal contrastive hashing (DUCH) method for RS text-image retrieval. Experimental results show that the proposed DUCH outperforms state-of-the-art unsupervised cross-modal hashing methods. Our code is publicly available at https://git.tu-berlin.de/rsim/duch.
arXiv Detail & Related papers (2022-01-20T12:05:10Z)
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval [80.35589927511667]
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image. We propose a novel fine-tuning framework which turns any pretrained text-image multi-modal model into an efficient retrieval model. Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
arXiv Detail & Related papers (2021-03-22T15:08:06Z)
Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network. There are two major challenges in the current one-step approaches. We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z)
Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning. We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z)
Task-adaptive Asymmetric Deep Cross-modal Hashing [20.399984971442]
Cross-modal hashing aims to embed semantic correlations of heterogeneous modality data into the binary hash codes with discriminative semantic labels. We present a Task-adaptive Asymmetric Deep Cross-modal Hashing (TA-ADCMH) method in this paper. It can learn task-adaptive hash functions for two sub-retrieval tasks via simultaneous modality representation and asymmetric hash learning.
arXiv Detail & Related papers (2020-04-01T02:09:20Z)
A Novel Incremental Cross-Modal Hashing Approach [21.99741793652628]
We propose a novel incremental cross-modal hashing algorithm termed "iCMH" The proposed approach consists of two sequential stages, namely, learning the hash codes and training the hash functions. Experiments across a variety of cross-modal datasets and comparisons with state-of-the-art cross-modal algorithms shows the usefulness of our approach.
arXiv Detail & Related papers (2020-02-03T12:34:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.