Context Unaware Knowledge Distillation for Image Retrieval
- URL: http://arxiv.org/abs/2207.09070v1
- Date: Tue, 19 Jul 2022 04:51:39 GMT
- Title: Context Unaware Knowledge Distillation for Image Retrieval
- Authors: Bytasandram Yaswanth Reddy, Shiv Ram Dubey, Rakesh Kumar Sanodiya,
Ravi Ranjan Prasad Karn
- Abstract summary: Existing knowledge distillation methods use logits and other features of the deep (teacher) model.
We propose context unaware knowledge distillation that uses the knowledge of the teacher model without fine-tuning it on the target context.
- Score: 11.38957822323395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing data-dependent hashing methods use large backbone networks with
millions of parameters and are computationally complex. Existing knowledge
distillation methods use logits and other features of the deep (teacher) model
and as knowledge for the compact (student) model, which requires the teacher's
network to be fine-tuned on the context in parallel with the student model on
the context. Training teacher on the target context requires more time and
computational resources. In this paper, we propose context unaware knowledge
distillation that uses the knowledge of the teacher model without fine-tuning
it on the target context. We also propose a new efficient student model
architecture for knowledge distillation. The proposed approach follows a
two-step process. The first step involves pre-training the student model with
the help of context unaware knowledge distillation from the teacher model. The
second step involves fine-tuning the student model on the context of image
retrieval. In order to show the efficacy of the proposed approach, we compare
the retrieval results, no. of parameters and no. of operations of the student
models with the teacher models under different retrieval frameworks, including
deep cauchy hashing (DCH) and central similarity quantization (CSQ). The
experimental results confirm that the proposed approach provides a promising
trade-off between the retrieval results and efficiency. The code used in this
paper is released publicly at \url{https://github.com/satoru2001/CUKDFIR}.
Related papers
- ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model [49.587821411012705]
We propose ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model.
It distills the knowledge from a large teacher CLIP model into a smaller student model, ensuring comparable performance with significantly reduced parameters.
EduAttention explores the cross-relationships between text features extracted by the teacher model and image features extracted by the student model.
arXiv Detail & Related papers (2024-08-08T01:12:21Z) - Let All be Whitened: Multi-teacher Distillation for Efficient Visual
Retrieval [57.17075479691486]
We propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowledge from off-the-shelf pre-trained retrieval models to a lightweight student model for efficient visual retrieval.
Our source code is released at https://github.com/Maryeon/whiten_mtd.
arXiv Detail & Related papers (2023-12-15T11:43:56Z) - Improved knowledge distillation by utilizing backward pass knowledge in
neural networks [17.437510399431606]
Knowledge distillation (KD) is one of the prominent techniques for model compression.
In this work, we generate new auxiliary training samples based on extracting knowledge from the backward pass of the teacher.
We show how this technique can be used successfully in applications of natural language processing (NLP) and language understanding.
arXiv Detail & Related papers (2023-01-27T22:07:38Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Teaching What You Should Teach: A Data-Based Distillation Method [20.595460553747163]
We introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework.
We propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally.
To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses.
arXiv Detail & Related papers (2022-12-11T06:22:14Z) - Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student.
At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL)
Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - Oracle Teacher: Leveraging Target Information for Better Knowledge
Distillation of CTC Models [10.941519846908697]
We introduce a new type of teacher model for connectionist temporal classification ( CTC)-based sequence models, namely Oracle Teacher.
Since the Oracle Teacher learns a more accurate CTC alignment by referring to the target information, it can provide the student with more optimal guidance.
Based on a many-to-one mapping property of the CTC algorithm, we present a training strategy that can effectively prevent the trivial solution.
arXiv Detail & Related papers (2021-11-05T14:14:05Z) - Data-Efficient Ranking Distillation for Image Retrieval [15.88955427198763]
Recent approaches tackle this issue using knowledge distillation to transfer knowledge from a deeper and heavier architecture to a much smaller network.
In this paper we address knowledge distillation for metric learning problems.
Unlike previous approaches, our proposed method jointly addresses the following constraints i) limited queries to teacher model, ii) black box teacher model with access to the final output representation, andiii) small fraction of original training data without any ground-truth labels.
arXiv Detail & Related papers (2020-07-10T10:59:16Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.