Related papers: Distilling from Similar Tasks for Transfer Learning on a Budget

Distilling from Similar Tasks for Transfer Learning on a Budget

URL: http://arxiv.org/abs/2304.12314v1
Date: Mon, 24 Apr 2023 17:59:01 GMT
Title: Distilling from Similar Tasks for Transfer Learning on a Budget
Authors: Kenneth Borup, Cheng Perng Phoo and Bharath Hariharan
Abstract summary: Transfer learning is an effective solution for training with few labels, however often at the expense of a computationally costly fine-tuning of large base models. We propose to mitigate this unpleasant trade-off between compute and accuracy via semi-supervised cross-domain distillation. Our methods need no access to source data, and merely need features and pseudo-labels of the source models.
Score: 38.998980344852846
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We address the challenge of getting efficient yet accurate recognition systems with limited labels. While recognition models improve with model size and amount of data, many specialized applications of computer vision have severe resource constraints both during training and inference. Transfer learning is an effective solution for training with few labels, however often at the expense of a computationally costly fine-tuning of large base models. We propose to mitigate this unpleasant trade-off between compute and accuracy via semi-supervised cross-domain distillation from a set of diverse source models. Initially, we show how to use task similarity metrics to select a single suitable source model to distill from, and that a good selection process is imperative for good downstream performance of a target model. We dub this approach DistillNearest. Though effective, DistillNearest assumes a single source model matches the target task, which is not always the case. To alleviate this, we propose a weighted multi-source distillation method to distill multiple source models trained on different domains weighted by their relevance for the target task into a single efficient model (named DistillWeighted). Our methods need no access to source data, and merely need features and pseudo-labels of the source models. When the goal is accurate recognition under computational constraints, both DistillNearest and DistillWeighted approaches outperform both transfer learning from strong ImageNet initializations as well as state-of-the-art semi-supervised techniques such as FixMatch. Averaged over 8 diverse target tasks our multi-source method outperforms the baselines by 5.6%-points and 4.5%-points, respectively.

Related papers

Smooth-Distill: A Self-distillation Framework for Multitask Learning with Wearable Sensor Data [0.0]
This paper introduces Smooth-Distill, a novel self-distillation framework designed to simultaneously perform human activity recognition (HAR) and sensor placement detection.<n>Unlike conventional distillation methods that require separate teacher and student models, the proposed framework utilizes a smoothed, historical version of the model itself as the teacher.<n> Experimental results show that Smooth-Distill consistently outperforms alternative approaches across different evaluation scenarios.
arXiv Detail & Related papers (2025-06-27T06:51:51Z)
Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization [34.53986517177061]
We propose a novel framework to existing diffusion-based distillation methods, leveraging diffusion models for selection rather than generation. Our method starts by predicting noise generated by the diffusion model based on input images and text prompts, then calculates the corresponding loss for each pair. This streamlined framework enables a single-step distillation process, and extensive experiments demonstrate that our approach outperforms state-of-the-art methods across various metrics.
arXiv Detail & Related papers (2024-12-13T08:34:46Z)
NegMerge: Consensual Weight Negation for Strong Machine Unlearning [21.081262106431506]
Machine unlearning aims to selectively remove specific knowledge from a model. Current methods rely on fine-tuning models on the forget set, generating a task vector, and subtracting it from the original model. We propose a novel method that leverages all given fine-tuned models rather than selecting a single one.
arXiv Detail & Related papers (2024-10-08T00:50:54Z)
Multi-Output Distributional Fairness via Post-Processing [47.94071156898198]
We introduce a post-processing method for multi-output models to enhance a model's distributional parity, a task-agnostic fairness measure. Existing methods for achieving distributional parity rely on the (inverse) cumulative density function of a model's output. An approximation technique is applied to reduce the complexity of computing the exact barycenter.
arXiv Detail & Related papers (2024-08-31T22:41:26Z)
Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable Transfer from Black-Box to Lightweight Segmentation Model [25.3403116022412]
CoRTe is a pseudo-labelling function that extracts reliable knowledge from a black-box source model. We benchmark CoRTe on two synthetic-to-real settings, demonstrating remarkable results when using black-box models to transfer knowledge on lightweight models for a target data distribution.
arXiv Detail & Related papers (2024-02-20T16:35:14Z)
Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach [20.86345962679122]
Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks. We propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task.
arXiv Detail & Related papers (2023-09-05T17:57:31Z)
Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data. However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations. This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Unsupervised Multi-source Domain Adaptation Without Access to Source Data [58.551861130011886]
Unsupervised Domain Adaptation (UDA) aims to learn a predictor model for an unlabeled domain by transferring knowledge from a separate labeled source domain. We propose a novel and efficient algorithm which automatically combines the source models with suitable weights in such a way that it performs at least as good as the best source model.
arXiv Detail & Related papers (2021-04-05T10:45:12Z)
Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation [55.34995029082051]
We propose a method to learn to augment for data-scarce domain BERT knowledge distillation. We show that the proposed method significantly outperforms state-of-the-art baselines on four different tasks.
arXiv Detail & Related papers (2021-01-20T13:07:39Z)
Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident. In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z)
The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs. We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit" We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.