Benchmarking Transferability: A Framework for Fair and Robust Evaluation
- URL: http://arxiv.org/abs/2504.20121v1
- Date: Mon, 28 Apr 2025 11:01:43 GMT
- Title: Benchmarking Transferability: A Framework for Fair and Robust Evaluation
- Authors: Alireza Kazemi, Helia Rezvani, Mahsa Baktashmotlagh,
- Abstract summary: Transferability scores aim to quantify how well a model trained on one domain generalizes to a target domain.<n>Despite numerous methods proposed for measuring transferability, their reliability and practical usefulness remain inconclusive.<n>We introduce a comprehensive benchmarking framework designed to systematically evaluate transferability scores across diverse settings.
- Score: 6.9052557953336295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transferability scores aim to quantify how well a model trained on one domain generalizes to a target domain. Despite numerous methods proposed for measuring transferability, their reliability and practical usefulness remain inconclusive, often due to differing experimental setups, datasets, and assumptions. In this paper, we introduce a comprehensive benchmarking framework designed to systematically evaluate transferability scores across diverse settings. Through extensive experiments, we observe variations in how different metrics perform under various scenarios, suggesting that current evaluation practices may not fully capture each method's strengths and limitations. Our findings underscore the value of standardized assessment protocols, paving the way for more reliable transferability measures and better-informed model selection in cross-domain applications. Additionally, we achieved a 3.5\% improvement using our proposed metric for the head-training fine-tuning experimental setup. Our code is available in this repository: https://github.com/alizkzm/pert_robust_platform.
Related papers
- Rectifying Conformity Scores for Better Conditional Coverage [75.73184036344908]
We present a new method for generating confidence sets within the split conformal prediction framework.<n>Our method performs a trainable transformation of any given conformity score to improve conditional coverage while ensuring exact marginal coverage.
arXiv Detail & Related papers (2025-02-22T19:54:14Z) - Beyond Models! Explainable Data Valuation and Metric Adaption for Recommendation [10.964035199849125]
Current methods employ data valuation to discern high-quality data from low-quality data.<n>We propose an explainable and versatile framework DVR which can enhance the efficiency of data utilization tailored to any requirements.<n>Our framework achieves up to 34.7% improvements over existing methods in terms of representative NDCG metric.
arXiv Detail & Related papers (2025-02-12T12:01:08Z) - PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines [86.36060279469304]
We introduce PredBench, a benchmark tailored for the holistic evaluation of prediction-temporal networks.
This benchmark integrates 12 widely adopted methods with diverse datasets across multiple application domains.
Its multi-dimensional evaluation framework broadens the analysis with a comprehensive set of metrics.
arXiv Detail & Related papers (2024-07-11T11:51:36Z) - Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling [14.668634411361307]
We introduce a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria.
We study existing metrics for quantifying mode collapse and introduce novel metrics for this purpose.
arXiv Detail & Related papers (2024-06-11T16:23:33Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - Accounting for multiplicity in machine learning benchmark performance [0.0]
Using the highest-ranked performance as an estimate for state-of-the-art (SOTA) performance is a biased estimator, giving overly optimistic results.
In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided.
arXiv Detail & Related papers (2023-03-10T10:32:18Z) - Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z) - How stable are Transferability Metrics evaluations? [32.24673254834567]
We conduct a large-scale study by systematically constructing a broad range of 715k experimental setup variations.
We discover that even small variations to an experimental setup lead to different conclusions about the superiority of a transferability metric over another.
arXiv Detail & Related papers (2022-04-04T11:38:40Z) - SQE: a Self Quality Evaluation Metric for Parameters Optimization in
Multi-Object Tracking [25.723436561224297]
We present a novel self quality evaluation metric SQE for parameters optimization in the challenging yet critical multi-object tracking task.
By contrast, our metric reflects the internal characteristics of trajectory hypotheses and measures tracking performance without ground truth.
arXiv Detail & Related papers (2020-04-16T06:07:29Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.