Let All be Whitened: Multi-teacher Distillation for Efficient Visual
Retrieval
- URL: http://arxiv.org/abs/2312.09716v1
- Date: Fri, 15 Dec 2023 11:43:56 GMT
- Title: Let All be Whitened: Multi-teacher Distillation for Efficient Visual
Retrieval
- Authors: Zhe Ma, Jianfeng Dong, Shouling Ji, Zhenguang Liu, Xuhong Zhang,
Zonghui Wang, Sifeng He, Feng Qian, Xiaobo Zhang, Lei Yang
- Abstract summary: We propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowledge from off-the-shelf pre-trained retrieval models to a lightweight student model for efficient visual retrieval.
Our source code is released at https://github.com/Maryeon/whiten_mtd.
- Score: 57.17075479691486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual retrieval aims to search for the most relevant visual items, e.g.,
images and videos, from a candidate gallery with a given query item. Accuracy
and efficiency are two competing objectives in retrieval tasks. Instead of
crafting a new method pursuing further improvement on accuracy, in this paper
we propose a multi-teacher distillation framework Whiten-MTD, which is able to
transfer knowledge from off-the-shelf pre-trained retrieval models to a
lightweight student model for efficient visual retrieval. Furthermore, we
discover that the similarities obtained by different retrieval models are
diversified and incommensurable, which makes it challenging to jointly distill
knowledge from multiple models. Therefore, we propose to whiten the output of
teacher models before fusion, which enables effective multi-teacher
distillation for retrieval models. Whiten-MTD is conceptually simple and
practically effective. Extensive experiments on two landmark image retrieval
datasets and one video retrieval dataset demonstrate the effectiveness of our
proposed method, and its good balance of retrieval performance and efficiency.
Our source code is released at https://github.com/Maryeon/whiten_mtd.
Related papers
- Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models [7.632217365130212]
Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks.
These models can produce hallucinations, particularly in domains with incomplete knowledge.
We introduce DualChecker, an innovative framework designed to mitigate hallucinations and improve the performance of both teacher and student models.
arXiv Detail & Related papers (2024-08-22T12:04:04Z) - Exploring Effective Factors for Improving Visual In-Context Learning [56.14208975380607]
In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models.
This paper shows that prompt selection and prompt fusion are two major factors that have a direct impact on the inference performance of visual context learning.
We propose a simple framework prompt-SelF for visual in-context learning.
arXiv Detail & Related papers (2023-04-10T17:59:04Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Teaching What You Should Teach: A Data-Based Distillation Method [20.595460553747163]
We introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework.
We propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally.
To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses.
arXiv Detail & Related papers (2022-12-11T06:22:14Z) - Context Unaware Knowledge Distillation for Image Retrieval [11.38957822323395]
Existing knowledge distillation methods use logits and other features of the deep (teacher) model.
We propose context unaware knowledge distillation that uses the knowledge of the teacher model without fine-tuning it on the target context.
arXiv Detail & Related papers (2022-07-19T04:51:39Z) - Curriculum Learning for Dense Retrieval Distillation [20.25741148622744]
We propose a generic curriculum learning based optimization framework called CL-DRD.
CL-DRD controls the difficulty level of training data produced by the re-ranking (teacher) model.
Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.
arXiv Detail & Related papers (2022-04-28T17:42:21Z) - Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression.
Current methods assign a fixed weight to a teacher model in the whole distillation.
Most of the existing methods allocate an equal weight to every teacher model.
In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z) - Differentiable Feature Aggregation Search for Knowledge Distillation [47.94874193183427]
We introduce the feature aggregation to imitate the multi-teacher distillation in the single-teacher distillation framework.
DFA is a two-stage Differentiable Feature Aggregation search method motivated by DARTS in neural architecture search.
Experimental results show that DFA outperforms existing methods on CIFAR-100 and CINIC-10 datasets.
arXiv Detail & Related papers (2020-08-02T15:42:29Z) - Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner.
We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.