How stable are Transferability Metrics evaluations?
- URL: http://arxiv.org/abs/2204.01403v1
- Date: Mon, 4 Apr 2022 11:38:40 GMT
- Title: How stable are Transferability Metrics evaluations?
- Authors: Andrea Agostinelli and Michal P\'andy and Jasper Uijlings and Thomas
Mensink and Vittorio Ferrari
- Abstract summary: We conduct a large-scale study by systematically constructing a broad range of 715k experimental setup variations.
We discover that even small variations to an experimental setup lead to different conclusions about the superiority of a transferability metric over another.
- Score: 32.24673254834567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transferability metrics is a maturing field with increasing interest, which
aims at providing heuristics for selecting the most suitable source models to
transfer to a given target dataset, without fine-tuning them all. However,
existing works rely on custom experimental setups which differ across papers,
leading to inconsistent conclusions about which transferability metrics work
best. In this paper we conduct a large-scale study by systematically
constructing a broad range of 715k experimental setup variations. We discover
that even small variations to an experimental setup lead to different
conclusions about the superiority of a transferability metric over another.
Then we propose better evaluations by aggregating across many experiments,
enabling to reach more stable conclusions. As a result, we reveal the
superiority of LogME at selecting good source datasets to transfer from in a
semantic segmentation scenario, NLEEP at selecting good source architectures in
an image classification scenario, and GBC at determining which target task
benefits most from a given source model. Yet, no single transferability metric
works best in all scenarios.
Related papers
- Benchmarking Transferability: A Framework for Fair and Robust Evaluation [6.9052557953336295]
Transferability scores aim to quantify how well a model trained on one domain generalizes to a target domain.
Despite numerous methods proposed for measuring transferability, their reliability and practical usefulness remain inconclusive.
We introduce a comprehensive benchmarking framework designed to systematically evaluate transferability scores across diverse settings.
arXiv Detail & Related papers (2025-04-28T11:01:43Z) - Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric [48.81957145701228]
We propose a new diversity metric based on sample-level "novelty"
We show that NovelSum accurately captures diversity variations and achieves a 0.97 correlation with instruction-tuned model performance.
arXiv Detail & Related papers (2025-02-24T14:20:22Z) - Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models [36.22392593103493]
Data selection for fine-tuning large language models (LLMs) aims to choose a high-quality subset from existing datasets.
Existing surveys overlook an in-depth exploration of the fine-tuning phase.
We introduce a novel three-stage scheme - comprising feature extraction, criteria design, and selector evaluation - to systematically categorize and evaluate these methods.
arXiv Detail & Related papers (2024-06-20T08:58:58Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Building a Winning Team: Selecting Source Model Ensembles using a
Submodular Transferability Estimation Approach [20.86345962679122]
Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks.
We propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task.
arXiv Detail & Related papers (2023-09-05T17:57:31Z) - Transferability Metrics for Object Detection [0.0]
Transfer learning aims to make the most of existing pre-trained models to achieve better performance on a new task in limited data scenarios.
We extend transferability metrics to object detection using ROI-Align and TLogME.
We show that TLogME provides a robust correlation with transfer performance and outperforms other transferability metrics on local and global level features.
arXiv Detail & Related papers (2023-06-27T08:49:31Z) - On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Towards Estimating Transferability using Hard Subsets [25.86053764521497]
We propose HASTE, a new strategy to estimate the transferability of a source model to a particular target task using only a harder subset of target data.
We show that HASTE can be used with any existing transferability metric to improve their reliability.
Our experimental results across multiple source model architectures, target datasets, and transfer learning tasks show that HASTE modified metrics are consistently better or on par with the state of the art transferability metrics.
arXiv Detail & Related papers (2023-01-17T14:50:18Z) - Optimal Condition Training for Target Source Separation [56.86138859538063]
We propose a new optimal condition training method for single-channel target source separation.
We show that the complementary information carried by the diverse semantic concepts significantly helps to disentangle and isolate sources of interest.
arXiv Detail & Related papers (2022-11-11T00:04:55Z) - Source data selection for out-of-domain generalization [0.76146285961466]
Poor selection of a source dataset can lead to poor performance on the target.
We propose two source selection methods that are based on the multi-bandit theory and random search.
Our proposals can be viewed as diagnostics for the existence of a reweighted source subsamples that perform better than the random selection of available samples.
arXiv Detail & Related papers (2022-02-04T14:37:31Z) - Client Selection in Federated Learning based on Gradients Importance [5.263296985310379]
Federated learning (FL) enables multiple devices to collaboratively learn a global model without sharing their personal data.
In this paper, we investigate and design a device selection strategy based on the importance of the gradient norms.
arXiv Detail & Related papers (2021-11-19T11:53:23Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.