Creating Something from Nothing: Unsupervised Knowledge Distillation for
Cross-Modal Hashing
- URL: http://arxiv.org/abs/2004.00280v1
- Date: Wed, 1 Apr 2020 08:32:15 GMT
- Title: Creating Something from Nothing: Unsupervised Knowledge Distillation for
Cross-Modal Hashing
- Authors: Hengtong Hu, Lingxi Xie, Richang Hong, Qi Tian
- Abstract summary: Cross-modal hashing (CMH) can map contents from different modalities, especially in vision and language, into the same space.
There are two main frameworks for CMH, differing from each other in whether semantic supervision is required.
In this paper, we propose a novel approach that enables guiding a supervised method using outputs produced by an unsupervised method.
- Score: 132.22315429623575
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In recent years, cross-modal hashing (CMH) has attracted increasing
attentions, mainly because its potential ability of mapping contents from
different modalities, especially in vision and language, into the same space,
so that it becomes efficient in cross-modal data retrieval. There are two main
frameworks for CMH, differing from each other in whether semantic supervision
is required. Compared to the unsupervised methods, the supervised methods often
enjoy more accurate results, but require much heavier labors in data
annotation. In this paper, we propose a novel approach that enables guiding a
supervised method using outputs produced by an unsupervised method.
Specifically, we make use of teacher-student optimization for propagating
knowledge. Experiments are performed on two popular CMH benchmarks, i.e., the
MIRFlickr and NUS-WIDE datasets. Our approach outperforms all existing
unsupervised methods by a large margin.
Related papers
- DANCE: Dual-View Distribution Alignment for Dataset Condensation [39.08022095906364]
We propose a new DM-based method named Dual-view distribution AligNment for dataset CondEnsation (DANCE)
Specifically, from the inner-class view, we construct multiple "middle encoders" to perform pseudo long-term distribution alignment.
While from the inter-class view, we use the expert models to perform distribution calibration.
arXiv Detail & Related papers (2024-06-03T07:22:17Z) - A Dimensional Structure based Knowledge Distillation Method for
Cross-Modal Learning [15.544134849816528]
We discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks.
We propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance.
The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy.
arXiv Detail & Related papers (2023-06-28T07:29:26Z) - MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL)
MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space.
Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z) - Deep Manifold Hashing: A Divide-and-Conquer Approach for Semi-Paired
Unsupervised Cross-Modal Retrieval [44.35575925811005]
Cross-modal hashing methods usually fail to cross modality gap when fully-paired data with plenty of labeled information is nonexistent.
We propose Deep Manifold Hashing (DMH), a novel method of dividing the problem of semi-paired unsupervised cross-modal retrieval into three sub-problems.
Experiments on three benchmarks demonstrate the superiority of our DMH compared with the state-of-the-art fully-paired and semi-paired unsupervised cross-modal hashing methods.
arXiv Detail & Related papers (2022-09-26T11:47:34Z) - CMD: Self-supervised 3D Action Representation Learning with Cross-modal
Mutual Distillation [130.08432609780374]
In 3D action recognition, there exists rich complementary information between skeleton modalities.
We propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs.
Our approach outperforms existing self-supervised methods and sets a series of new records.
arXiv Detail & Related papers (2022-08-26T06:06:09Z) - Semi-supervised Semantic Segmentation with Mutual Knowledge Distillation [20.741353967123366]
We propose a new consistency regularization framework, termed mutual knowledge distillation (MKD)
We use the pseudo-labels generated by a mean teacher to supervise the student network to achieve a mutual knowledge distillation between the two branches.
Our framework outperforms previous state-of-the-art (SOTA) methods under various semi-supervised settings.
arXiv Detail & Related papers (2022-08-24T12:47:58Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON)
First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z) - Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning.
We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations.
We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.