Related papers: Where to start? Analyzing the potential value of intermediate models

Where to start? Analyzing the potential value of intermediate models

URL: http://arxiv.org/abs/2211.00107v2
Date: Wed, 2 Nov 2022 08:49:45 GMT
Title: Where to start? Analyzing the potential value of intermediate models
Authors: Leshem Choshen, Elad Venezian, Shachar Don-Yehia, Noam Slonim, Yoav Katz
Abstract summary: We perform a systematic analysis of the emphintertraining scheme over a wide range of English classification tasks. Surprisingly, our analysis suggests that the potential intertraining gain can be analyzed emphindependently for the target dataset. We leverage our analysis to propose a practical and efficient approach to determine if and how to select a base model in real-world settings.
Score: 16.32982010228009
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Previous studies observed that finetuned models may be better base models than the vanilla pretrained model. Such a model, finetuned on some source dataset, may provide a better starting point for a new finetuning process on a desired target dataset. Here, we perform a systematic analysis of this \emph{intertraining} scheme, over a wide range of English classification tasks. Surprisingly, our analysis suggests that the potential intertraining gain can be analyzed \emph{independently} for the target dataset under consideration, and for a base model being considered as a starting point. This is in contrast to current perception that the alignment between the target dataset and the source dataset used to generate the base model is a major factor in determining intertraining success. We analyze different aspects that contribute to each. Furthermore, we leverage our analysis to propose a practical and efficient approach to determine if and how to select a base model in real-world settings. Last, we release an updating ranking of best models in the HuggingFace hub per architecture https://ibm.github.io/model-recycling/.

Related papers

Data Complexity-aware Deep Model Performance Forecasting [0.5735035463793009]
We propose a lightweight, two-stage framework that can estimate model performance before training.<n>The setup allows the framework to generalize across datasets and model types.<n>We find that some of the underlying features used for prediction can offer practical guidance for model selection.
arXiv Detail & Related papers (2026-01-04T05:31:04Z)
Estimating Time Series Foundation Model Transferability via In-Context Learning [74.65355820906355]
Time series foundation models (TSFMs) offer strong zero-shot forecasting via large-scale pre-training.<n>Fine-tuning remains critical for boosting performance in domains with limited public data.<n>We introduce TimeTic, a transferability estimation framework that recasts model selection as an in-context-learning problem.
arXiv Detail & Related papers (2025-09-28T07:07:13Z)
Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. Existing approaches require re-training models on different data subsets, which is computationally intensive. This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z)
Policy Trees for Prediction: Interpretable and Adaptive Model Selection for Machine Learning [5.877778007271621]
We introduce a tree-based approach, Optimal Predictive-Policy Trees (OP2T), that yields interpretable policies for adaptively selecting a predictive model or ensemble. Our approach enables interpretable and adaptive model selection and rejection while only assuming access to model outputs. We evaluate our approach on real-world datasets, including regression and classification tasks with both structured and unstructured data.
arXiv Detail & Related papers (2024-05-30T21:21:33Z)
A Two-Phase Recall-and-Select Framework for Fast Model Selection [13.385915962994806]
We propose a two-phase (coarse-recall and fine-selection) model selection framework. It aims to enhance the efficiency of selecting a robust model by leveraging the models' training performances on benchmark datasets. It has been demonstrated that the proposed methodology facilitates the selection of a high-performing model at a rate about 3x times faster than conventional baseline methods.
arXiv Detail & Related papers (2024-03-28T14:44:44Z)
Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients. We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z)
Anchor Points: Benchmarking Models with Much Fewer Examples [88.02417913161356]
In six popular language classification benchmarks, model confidence in the correct class on many pairs of points is strongly correlated across models. We propose Anchor Point Selection, a technique to select small subsets of datasets that capture model behavior across the entire dataset. Just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error.
arXiv Detail & Related papers (2023-09-14T17:45:51Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
Comparing Foundation Models using Data Kernels [13.099029073152257]
We present a methodology for directly comparing the embedding space geometry of foundation models. Our methodology is grounded in random graph theory and enables valid hypothesis testing of embedding similarity. We show how our framework can induce a manifold of models equipped with a distance function that correlates strongly with several downstream metrics.
arXiv Detail & Related papers (2023-05-09T02:01:07Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect. There always exists severe domain shift between the pretraining and downstream ABSA datasets. We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z)
Data Summarization via Bilevel Optimization [48.89977988203108]
A simple yet powerful approach is to operate on small subsets of data. In this work, we propose a generic coreset framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem.
arXiv Detail & Related papers (2021-09-26T09:08:38Z)
A Topological-Framework to Improve Analysis of Machine Learning Model Performance [5.3893373617126565]
We propose a framework for evaluating machine learning models in which a dataset is treated as a "space" on which a model operates. We describe a topological data structure, presheaves, which offer a convenient way to store and analyze model performance between different subpopulations.
arXiv Detail & Related papers (2021-07-09T23:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.