Distilling from Similar Tasks for Transfer Learning on a Budget
- URL: http://arxiv.org/abs/2304.12314v1
- Date: Mon, 24 Apr 2023 17:59:01 GMT
- Title: Distilling from Similar Tasks for Transfer Learning on a Budget
- Authors: Kenneth Borup, Cheng Perng Phoo and Bharath Hariharan
- Abstract summary: Transfer learning is an effective solution for training with few labels, however often at the expense of a computationally costly fine-tuning of large base models.
We propose to mitigate this unpleasant trade-off between compute and accuracy via semi-supervised cross-domain distillation.
Our methods need no access to source data, and merely need features and pseudo-labels of the source models.
- Score: 38.998980344852846
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the challenge of getting efficient yet accurate recognition
systems with limited labels. While recognition models improve with model size
and amount of data, many specialized applications of computer vision have
severe resource constraints both during training and inference. Transfer
learning is an effective solution for training with few labels, however often
at the expense of a computationally costly fine-tuning of large base models. We
propose to mitigate this unpleasant trade-off between compute and accuracy via
semi-supervised cross-domain distillation from a set of diverse source models.
Initially, we show how to use task similarity metrics to select a single
suitable source model to distill from, and that a good selection process is
imperative for good downstream performance of a target model. We dub this
approach DistillNearest. Though effective, DistillNearest assumes a single
source model matches the target task, which is not always the case. To
alleviate this, we propose a weighted multi-source distillation method to
distill multiple source models trained on different domains weighted by their
relevance for the target task into a single efficient model (named
DistillWeighted). Our methods need no access to source data, and merely need
features and pseudo-labels of the source models. When the goal is accurate
recognition under computational constraints, both DistillNearest and
DistillWeighted approaches outperform both transfer learning from strong
ImageNet initializations as well as state-of-the-art semi-supervised techniques
such as FixMatch. Averaged over 8 diverse target tasks our multi-source method
outperforms the baselines by 5.6%-points and 4.5%-points, respectively.
Related papers
- NegMerge: Consensual Weight Negation for Strong Machine Unlearning [21.081262106431506]
Machine unlearning aims to selectively remove specific knowledge from a model.
Current methods rely on fine-tuning models on the forget set, generating a task vector, and subtracting it from the original model.
We propose a novel method that leverages all given fine-tuned models rather than selecting a single one.
arXiv Detail & Related papers (2024-10-08T00:50:54Z) - Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable
Transfer from Black-Box to Lightweight Segmentation Model [25.3403116022412]
CoRTe is a pseudo-labelling function that extracts reliable knowledge from a black-box source model.
We benchmark CoRTe on two synthetic-to-real settings, demonstrating remarkable results when using black-box models to transfer knowledge on lightweight models for a target data distribution.
arXiv Detail & Related papers (2024-02-20T16:35:14Z) - Building a Winning Team: Selecting Source Model Ensembles using a
Submodular Transferability Estimation Approach [20.86345962679122]
Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks.
We propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task.
arXiv Detail & Related papers (2023-09-05T17:57:31Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Unsupervised Multi-source Domain Adaptation Without Access to Source
Data [58.551861130011886]
Unsupervised Domain Adaptation (UDA) aims to learn a predictor model for an unlabeled domain by transferring knowledge from a separate labeled source domain.
We propose a novel and efficient algorithm which automatically combines the source models with suitable weights in such a way that it performs at least as good as the best source model.
arXiv Detail & Related papers (2021-04-05T10:45:12Z) - Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation [55.34995029082051]
We propose a method to learn to augment for data-scarce domain BERT knowledge distillation.
We show that the proposed method significantly outperforms state-of-the-art baselines on four different tasks.
arXiv Detail & Related papers (2021-01-20T13:07:39Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.