Dynamic Contrastive Distillation for Image-Text Retrieval
- URL: http://arxiv.org/abs/2207.01426v1
- Date: Mon, 4 Jul 2022 14:08:59 GMT
- Title: Dynamic Contrastive Distillation for Image-Text Retrieval
- Authors: Jun Rao, Liang Ding, Shuhan Qi, Meng Fang, Yang Liu, Li Shen, Dacheng
Tao
- Abstract summary: We present a novel plug-in dynamic contrastive distillation (DCD) framework to compress image-text retrieval models.
We successfully apply our proposed DCD strategy to two state-of-the-art vision-language pretrained models, i.e. ViLT and METER.
Experiments on MS-COCO and Flickr30K benchmarks show the effectiveness and efficiency of our DCD framework.
- Score: 90.05345397400144
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Although the vision-and-language pretraining (VLP) equipped cross-modal
image-text retrieval (ITR) has achieved remarkable progress in the past two
years, it suffers from a major drawback: the ever-increasing size of VLP models
restricts its deployment to real-world search scenarios (where the high latency
is unacceptable). To alleviate this problem, we present a novel plug-in dynamic
contrastive distillation (DCD) framework to compress the large VLP models for
the ITR task. Technically, we face the following two challenges: 1) the typical
uni-modal metric learning approach is difficult to directly apply to the
cross-modal tasks, due to the limited GPU memory to optimize too many negative
samples during handling cross-modal fusion features. 2) it is inefficient to
static optimize the student network from different hard samples, which have
different effects on distillation learning and student network optimization. We
try to overcome these challenges from two points. First, to achieve multi-modal
contrastive learning, and balance the training costs and effects, we propose to
use a teacher network to estimate the difficult samples for students, making
the students absorb the powerful knowledge from pre-trained teachers, and
master the knowledge from hard samples. Second, to dynamic learn from hard
sample pairs, we propose dynamic distillation to dynamically learn samples of
different difficulties, from the perspective of better balancing the difficulty
of knowledge and students' self-learning ability. We successfully apply our
proposed DCD strategy to two state-of-the-art vision-language pretrained
models, i.e. ViLT and METER. Extensive experiments on MS-COCO and Flickr30K
benchmarks show the effectiveness and efficiency of our DCD framework.
Encouragingly, we can speed up the inference at least 129$\times$ compared to
the existing ITR models.
Related papers
- CFTS-GAN: Continual Few-Shot Teacher Student for Generative Adversarial Networks [0.5024983453990064]
Few-shot and continual learning face two well-known challenges in GANs: overfitting and catastrophic forgetting.
This paper proposes a Continual Few-shot Teacher-Student technique for the generative adversarial network (CFTS-GAN) that considers both challenges together.
arXiv Detail & Related papers (2024-10-17T20:49:08Z) - Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling [2.91204440475204]
Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models.
They rely on sequential denoising steps during sample generation.
We propose a novel method that integrates denoising phases directly into the model's architecture.
arXiv Detail & Related papers (2024-05-31T08:19:44Z) - DisCo: Distilled Student Models Co-training for Semi-supervised Text
Mining [23.418419374791107]
DisCo is a semi-supervised learning framework for fine-tuning a cohort of small student models generated from a large PLM.
We show that DisCo can produce student models that are 7.6 times smaller and 4.8 times faster in inference than the baseline PLMs.
arXiv Detail & Related papers (2023-05-20T03:23:16Z) - Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z) - EUCLID: Towards Efficient Unsupervised Reinforcement Learning with
Multi-choice Dynamics Model [46.99510778097286]
Unsupervised reinforcement learning (URL) poses a promising paradigm to learn useful behaviors in a task-agnostic environment.
We introduce a novel model-fused paradigm to jointly pre-train the dynamics model and unsupervised exploration policy in the pre-training phase.
We show that EUCLID achieves state-of-the-art performance with high sample efficiency.
arXiv Detail & Related papers (2022-10-02T12:11:44Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - A Practical Contrastive Learning Framework for Single-Image
Super-Resolution [51.422185656787285]
We investigate contrastive learning-based single image super-resolution from two perspectives.
We propose a practical contrastive learning framework for SISR, named PCL-SR.
Compared with existing benchmark methods, we re-train them by our proposed PCL-SR framework and achieve superior performance.
arXiv Detail & Related papers (2021-11-27T15:42:12Z) - Multi-Scale Aligned Distillation for Low-Resolution Detection [68.96325141432078]
This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model.
On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training.
arXiv Detail & Related papers (2021-09-14T12:53:35Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.