Integrating Information Theory and Adversarial Learning for Cross-modal
Retrieval
- URL: http://arxiv.org/abs/2104.04991v1
- Date: Sun, 11 Apr 2021 11:04:55 GMT
- Title: Integrating Information Theory and Adversarial Learning for Cross-modal
Retrieval
- Authors: Wei Chen, Yu Liu, Erwin M. Bakker, Michael S. Lew
- Abstract summary: Accurately matching visual and textual data in cross-modal retrieval has been widely studied in the multimedia community.
We propose integrating Shannon information theory and adversarial learning.
In terms of the gap, we integrate modality classification and information entropy adversarially.
- Score: 19.600581093189362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurately matching visual and textual data in cross-modal retrieval has been
widely studied in the multimedia community. To address these challenges posited
by the heterogeneity gap and the semantic gap, we propose integrating Shannon
information theory and adversarial learning. In terms of the heterogeneity gap,
we integrate modality classification and information entropy maximization
adversarially. For this purpose, a modality classifier (as a discriminator) is
built to distinguish the text and image modalities according to their different
statistical properties. This discriminator uses its output probabilities to
compute Shannon information entropy, which measures the uncertainty of the
modality classification it performs. Moreover, feature encoders (as a
generator) project uni-modal features into a commonly shared space and attempt
to fool the discriminator by maximizing its output information entropy. Thus,
maximizing information entropy gradually reduces the distribution discrepancy
of cross-modal features, thereby achieving a domain confusion state where the
discriminator cannot classify two modalities confidently. To reduce the
semantic gap, Kullback-Leibler (KL) divergence and bi-directional triplet loss
are used to associate the intra- and inter-modality similarity between features
in the shared space. Furthermore, a regularization term based on KL-divergence
with temperature scaling is used to calibrate the biased label classifier
caused by the data imbalance issue. Extensive experiments with four deep models
on four benchmarks are conducted to demonstrate the effectiveness of the
proposed approach.
Related papers
- Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z) - Generalizable Heterogeneous Federated Cross-Correlation and Instance
Similarity Learning [60.058083574671834]
This paper presents a novel FCCL+, federated correlation and similarity learning with non-target distillation.
For heterogeneous issue, we leverage irrelevant unlabeled public data for communication.
For catastrophic forgetting in local updating stage, FCCL+ introduces Federated Non Target Distillation.
arXiv Detail & Related papers (2023-09-28T09:32:27Z) - Self-Supervised Learning with an Information Maximization Criterion [5.214806886230471]
We argue that a straightforward application of information among alternative representations of the same input naturally solves the collapse problem.
We propose a self-supervised learning method, CorInfoMax, that uses a second-order statistics-based mutual information measure.
Numerical experiments demonstrate that CorInfoMax achieves better or competitive performance results relative to the state-of-the-art SSL approaches.
arXiv Detail & Related papers (2022-09-16T15:26:19Z) - Discriminative Supervised Subspace Learning for Cross-modal Retrieval [16.035973055257642]
We propose a discriminative supervised subspace learning for cross-modal retrieval(DS2L)
Specifically, we first construct a shared semantic graph to preserve the semantic structure within each modality.
We then introduce the Hilbert-Schmidt Independence Criterion(HSIC) to preserve the consistence between feature-similarity and semantic-similarity of samples.
arXiv Detail & Related papers (2022-01-26T14:27:39Z) - Learning Conditional Invariance through Cycle Consistency [60.85059977904014]
We propose a novel approach to identify meaningful and independent factors of variation in a dataset.
Our method involves two separate latent subspaces for the target property and the remaining input information.
We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models.
arXiv Detail & Related papers (2021-11-25T17:33:12Z) - The Role of Mutual Information in Variational Classifiers [47.10478919049443]
We study the generalization error of classifiers relying on encodings trained on the cross-entropy loss.
We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information.
arXiv Detail & Related papers (2020-10-22T12:27:57Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Learning Diverse and Discriminative Representations via the Principle of
Maximal Coding Rate Reduction [32.21975128854042]
We propose the principle of Maximal Coding Rate Reduction ($textMCR2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class.
We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantees for learning diverse and discriminative features.
arXiv Detail & Related papers (2020-06-15T17:23:55Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.