Data-Efficient Ranking Distillation for Image Retrieval
- URL: http://arxiv.org/abs/2007.05299v2
- Date: Mon, 13 Jul 2020 10:51:04 GMT
- Title: Data-Efficient Ranking Distillation for Image Retrieval
- Authors: Zakaria Laskar, Juho Kannala
- Abstract summary: Recent approaches tackle this issue using knowledge distillation to transfer knowledge from a deeper and heavier architecture to a much smaller network.
In this paper we address knowledge distillation for metric learning problems.
Unlike previous approaches, our proposed method jointly addresses the following constraints i) limited queries to teacher model, ii) black box teacher model with access to the final output representation, andiii) small fraction of original training data without any ground-truth labels.
- Score: 15.88955427198763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in deep learning has lead to rapid developments in the field
of image retrieval. However, the best performing architectures incur
significant computational cost. Recent approaches tackle this issue using
knowledge distillation to transfer knowledge from a deeper and heavier
architecture to a much smaller network. In this paper we address knowledge
distillation for metric learning problems. Unlike previous approaches, our
proposed method jointly addresses the following constraints i) limited queries
to teacher model, ii) black box teacher model with access to the final output
representation, and iii) small fraction of original training data without any
ground-truth labels. In addition, the distillation method does not require the
student and teacher to have same dimensionality. Addressing these constraints
reduces computation requirements, dependency on large-scale training datasets
and addresses practical scenarios of limited or partial access to private data
such as teacher models or the corresponding training data/labels. The key idea
is to augment the original training set with additional samples by performing
linear interpolation in the final output representation space. Distillation is
then performed in the joint space of original and augmented teacher-student
sample representations. Results demonstrate that our approach can match
baseline models trained with full supervision. In low training sample settings,
our approach outperforms the fully supervised approach on two challenging image
retrieval datasets, ROxford5k and RParis6k \cite{Roxf} with the least possible
teacher supervision.
Related papers
- Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - Reverse Knowledge Distillation: Training a Large Model using a Small One
for Retinal Image Matching on Limited Data [1.9521342770943706]
We propose a novel approach based on reverse knowledge distillation to train large models with limited data.
We train a computationally heavier model based on a vision transformer encoder using the lighter CNN-based model.
Our experiments suggest that high-dimensional fitting in representation space may prevent overfitting unlike training directly to match the final output.
arXiv Detail & Related papers (2023-07-20T08:39:20Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - Generative Adversarial Simulator [2.3986080077861787]
We introduce a simulator-free approach to knowledge distillation in the context of reinforcement learning.
A key challenge is having the student learn the multiplicity of cases that correspond to a given action.
This is the first demonstration of simulator-free knowledge distillation between a teacher and a student policy.
arXiv Detail & Related papers (2020-11-23T15:31:12Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner.
We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.