Information-Theoretic Hashing for Zero-Shot Cross-Modal Retrieval
- URL: http://arxiv.org/abs/2209.12491v1
- Date: Mon, 26 Sep 2022 08:05:20 GMT
- Title: Information-Theoretic Hashing for Zero-Shot Cross-Modal Retrieval
- Authors: Yufeng Shi, Shujian Yu, Duanquan Xu, Xinge You
- Abstract summary: In this paper, we consider a totally different way to construct (or learn) a common hamming space from an information-theoretic perspective.
Specifically, our AIA module takes the inspiration from the Principle of Relevant Information (PRI) to construct a common space that adaptively aggregates the intrinsic semantics of different modalities of data.
Our SPE module further generates the hashing codes of different modalities by preserving the similarity of intrinsic semantics with the element-wise Kullback-Leibler (KL) divergence.
- Score: 19.97731329580582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot cross-modal retrieval (ZS-CMR) deals with the retrieval problem
among heterogenous data from unseen classes. Typically, to guarantee
generalization, the pre-defined class embeddings from natural language
processing (NLP) models are used to build a common space. In this paper,
instead of using an extra NLP model to define a common space beforehand, we
consider a totally different way to construct (or learn) a common hamming space
from an information-theoretic perspective. We term our model the
Information-Theoretic Hashing (ITH), which is composed of two cascading
modules: an Adaptive Information Aggregation (AIA) module; and a Semantic
Preserving Encoding (SPE) module. Specifically, our AIA module takes the
inspiration from the Principle of Relevant Information (PRI) to construct a
common space that adaptively aggregates the intrinsic semantics of different
modalities of data and filters out redundant or irrelevant information. On the
other hand, our SPE module further generates the hashing codes of different
modalities by preserving the similarity of intrinsic semantics with the
element-wise Kullback-Leibler (KL) divergence. A total correlation
regularization term is also imposed to reduce the redundancy amongst different
dimensions of hash codes. Sufficient experiments on three benchmark datasets
demonstrate the superiority of the proposed ITH in ZS-CMR. Source code is
available in the supplementary material.
Related papers
- Style Quantization for Data-Efficient GAN Training [18.40243591024141]
Under limited data setting, GANs often struggle to navigate and effectively exploit the input latent space.<n>We propose textitSQ-GAN, a novel approach that enhances consistency regularization.<n>Experiments demonstrate significant improvements in both discriminator robustness and generation quality.
arXiv Detail & Related papers (2025-03-31T16:28:44Z) - SEER-ZSL: Semantic Encoder-Enhanced Representations for Generalized
Zero-Shot Learning [0.7420433640907689]
Generalized Zero-Shot Learning (GZSL) recognizes unseen classes by transferring knowledge from the seen classes.
This paper introduces a dual strategy to address the generalization gap.
arXiv Detail & Related papers (2023-12-20T15:18:51Z) - Learning Invariant Molecular Representation in Latent Discrete Space [52.13724532622099]
We propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts.
Our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
arXiv Detail & Related papers (2023-10-22T04:06:44Z) - RCMHA: Relative Convolutional Multi-Head Attention for Natural Language
Modelling [0.0]
Relative Multi-Head Attention (RMHA) has superior accuracy, boasting a score of 0.572 in comparison to alternative attention modules.
RMHA emerges as the most frugal, demonstrating an average consumption of 2.98 GB, surpassing RMHA which necessitates 3.5 GB.
arXiv Detail & Related papers (2023-08-07T09:24:24Z) - Symmetric Equilibrium Learning of VAEs [56.56929742714685]
We view variational autoencoders (VAEs) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa.
We propose a Nash equilibrium learning approach, which is symmetric with respect to the encoder and decoder and allows learning VAEs in situations where both the data and the latent distributions are accessible only by sampling.
arXiv Detail & Related papers (2023-07-19T10:27:34Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - MIANet: Aggregating Unbiased Instance and General Information for
Few-Shot Semantic Segmentation [6.053853367809978]
Existing few-shot segmentation methods are based on the meta-learning strategy and extract instance knowledge from a support set.
We propose a multi-information aggregation network (MIANet) that effectively leverages the general knowledge, i.e., semantic word embeddings, and instance information for accurate segmentation.
Experiments on PASCAL-5i and COCO-20i show that MIANet yields superior performance and set a new state-of-the-art.
arXiv Detail & Related papers (2023-05-23T09:36:27Z) - Meta-Causal Feature Learning for Out-of-Distribution Generalization [71.38239243414091]
This paper presents a balanced meta-causal learner (BMCL), which includes a balanced task generation module (BTG) and a meta-causal feature learning module (MCFL)
BMCL effectively identifies the class-invariant visual regions for classification and may serve as a general framework to improve the performance of the state-of-the-art methods.
arXiv Detail & Related papers (2022-08-22T09:07:02Z) - HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones.
We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains.
Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.