Descriptor Distillation: a Teacher-Student-Regularized Framework for
Learning Local Descriptors
- URL: http://arxiv.org/abs/2209.11795v2
- Date: Fri, 8 Mar 2024 06:45:47 GMT
- Title: Descriptor Distillation: a Teacher-Student-Regularized Framework for
Learning Local Descriptors
- Authors: Yuzhen Liu and Qiulei Dong
- Abstract summary: We propose a Descriptor Distillation framework for local descriptor learning, called DesDis.
A student model gains knowledge from a pre-trained teacher model, and it is further enhanced via a designed teacher-student regularizer.
Experimental results on 3 public datasets demonstrate that the equal-weight student models could achieve significantly better performances than their teachers.
- Score: 17.386735294534738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning a fast and discriminative patch descriptor is a challenging topic in
computer vision. Recently, many existing works focus on training various
descriptor learning networks by minimizing a triplet loss (or its variants),
which is expected to decrease the distance between each positive pair and
increase the distance between each negative pair. However, such an expectation
has to be lowered due to the non-perfect convergence of network optimizer to a
local solution. Addressing this problem and the open computational speed
problem, we propose a Descriptor Distillation framework for local descriptor
learning, called DesDis, where a student model gains knowledge from a
pre-trained teacher model, and it is further enhanced via a designed
teacher-student regularizer. This teacher-student regularizer is to constrain
the difference between the positive (also negative) pair similarity from the
teacher model and that from the student model, and we theoretically prove that
a more effective student model could be trained by minimizing a weighted
combination of the triplet loss and this regularizer, than its teacher which is
trained by minimizing the triplet loss singly. Under the proposed DesDis, many
existing descriptor networks could be embedded as the teacher model, and
accordingly, both equal-weight and light-weight student models could be
derived, which outperform their teacher in either accuracy or speed.
Experimental results on 3 public datasets demonstrate that the equal-weight
student models, derived from the proposed DesDis framework by utilizing three
typical descriptor learning networks as teacher models, could achieve
significantly better performances than their teachers and several other
comparative methods. In addition, the derived light-weight models could achieve
8 times or even faster speeds than the comparative methods under similar patch
verification performances
Related papers
- Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - UnLearning from Experience to Avoid Spurious Correlations [3.283369870504872]
We propose a new approach that addresses the issue of spurious correlations: UnLearning from Experience (ULE)
Our method is based on using two classification models trained in parallel: student and teacher models.
We show that our method is effective on the Waterbirds, CelebA, Spawrious and UrbanCars datasets.
arXiv Detail & Related papers (2024-09-04T15:06:44Z) - Establishing a stronger baseline for lightweight contrastive models [10.63129923292905]
Recent research has reported a performance degradation in self-supervised contrastive learning for specially designed efficient networks.
A common practice is to introduce a pretrained contrastive teacher model and train the lightweight networks with distillation signals generated by the teacher.
In this work, we aim to establish a stronger baseline for lightweight contrastive models without using a pretrained teacher model.
arXiv Detail & Related papers (2022-12-14T11:20:24Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Knowledge Distillation via Weighted Ensemble of Teaching Assistants [18.593268785143426]
Knowledge distillation is the process of transferring knowledge from a large model called the teacher to a smaller model called the student.
When the network size gap between the teacher and student increases, the performance of the student network decreases.
We have shown that using multiple teaching assistant models, the student model (the smaller model) can be further improved.
arXiv Detail & Related papers (2022-06-23T22:50:05Z) - Solving Inefficiency of Self-supervised Representation Learning [87.30876679780532]
Existing contrastive learning methods suffer from very low learning efficiency.
Under-clustering and over-clustering problems are major obstacles to learning efficiency.
We propose a novel self-supervised learning framework using a median triplet loss.
arXiv Detail & Related papers (2021-04-18T07:47:10Z) - Understanding Robustness in Teacher-Student Setting: A New Perspective [42.746182547068265]
Adrial examples are machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions.
Extensive studies try to explain the existence of adversarial examples and provide ways to improve model robustness.
Our studies could shed light on the future exploration about adversarial examples, and enhancing model robustness via principled data augmentation.
arXiv Detail & Related papers (2021-02-25T20:54:24Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.