Consensus-Driven Active Model Selection
- URL: http://arxiv.org/abs/2507.23771v1
- Date: Thu, 31 Jul 2025 17:56:28 GMT
- Title: Consensus-Driven Active Model Selection
- Authors: Justin Kay, Grant Van Horn, Subhransu Maji, Daniel Sheldon, Sara Beery,
- Abstract summary: We propose a method for active model selection using predictions from candidate models to prioritize the labeling of test data points.<n>Our method, CODA, performs consensus-driven active model selection by modeling relationships between categories, and data points.<n>We validate our approach by curating a collection of 26 benchmark tasks capturing a range of model selection scenarios.
- Score: 29.150990754584978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The widespread availability of off-the-shelf machine learning models poses a challenge: which model, of the many available candidates, should be chosen for a given data analysis task? This question of model selection is traditionally answered by collecting and annotating a validation dataset -- a costly and time-intensive process. We propose a method for active model selection, using predictions from candidate models to prioritize the labeling of test data points that efficiently differentiate the best candidate. Our method, CODA, performs consensus-driven active model selection by modeling relationships between classifiers, categories, and data points within a probabilistic framework. The framework uses the consensus and disagreement between models in the candidate pool to guide the label acquisition process, and Bayesian inference to update beliefs about which model is best as more information is collected. We validate our approach by curating a collection of 26 benchmark tasks capturing a range of model selection scenarios. CODA outperforms existing methods for active model selection significantly, reducing the annotation effort required to discover the best model by upwards of 70% compared to the previous state-of-the-art. Code and data are available at https://github.com/justinkay/coda.
Related papers
- All models are wrong, some are useful: Model Selection with Limited Labels [49.62984196182567]
We introduce MODEL SELECTOR, a framework for label-efficient selection of pretrained classifiers.
We show that MODEL SELECTOR drastically reduces the need for labeled data while consistently picking the best or near-best performing model.
Our results further highlight the robustness of MODEL SELECTOR in model selection, as it reduces the labeling cost by up to 72.41% when selecting a near-best model.
arXiv Detail & Related papers (2024-10-17T14:45:56Z) - Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models [36.22392593103493]
Data selection for fine-tuning large language models (LLMs) aims to choose a high-quality subset from existing datasets.<n>Existing surveys overlook an in-depth exploration of the fine-tuning phase.<n>We introduce a novel three-stage scheme - comprising feature extraction, criteria design, and selector evaluation - to systematically categorize and evaluate these methods.
arXiv Detail & Related papers (2024-06-20T08:58:58Z) - Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection [40.85209520973634]
An ideal model selection scheme should support two operations efficiently over a large pool of candidate models.
Previous solutions to model selection require high computational complexity for at least one of these two operations.
We present Standardized Embedder, an empirical realization of isolated model embedding.
arXiv Detail & Related papers (2024-06-11T17:57:49Z) - A Two-Phase Recall-and-Select Framework for Fast Model Selection [13.385915962994806]
We propose a two-phase (coarse-recall and fine-selection) model selection framework.
It aims to enhance the efficiency of selecting a robust model by leveraging the models' training performances on benchmark datasets.
It has been demonstrated that the proposed methodology facilitates the selection of a high-performing model at a rate about 3x times faster than conventional baseline methods.
arXiv Detail & Related papers (2024-03-28T14:44:44Z) - DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality.
We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data.
Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z) - Budgeted Online Model Selection and Fine-Tuning via Federated Learning [26.823435733330705]
Online model selection involves selecting a model from a set of candidate models 'on the fly' to perform prediction on a stream of data.
The choice of candidate models henceforth has a crucial impact on the performance.
The present paper proposes an online federated model selection framework where a group of learners (clients) interacts with a server with sufficient memory.
Using the proposed algorithm, clients and the server collaborate to fine-tune models to adapt them to a non-stationary environment.
arXiv Detail & Related papers (2024-01-19T04:02:49Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Contextual Active Model Selection [10.925932167673764]
We present an approach to actively select pre-trained models while minimizing labeling costs.<n>The objective is to adaptively select the best model to make a prediction while limiting label requests.<n>We propose CAMS, a contextual active model selection algorithm that relies on two novel components.
arXiv Detail & Related papers (2022-07-13T08:22:22Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - A linearized framework and a new benchmark for model selection for
fine-tuning [112.20527122513668]
Fine-tuning from a collection of models pre-trained on different domains is emerging as a technique to improve test accuracy in the low-data regime.
We introduce two new baselines for model selection -- Label-Gradient and Label-Feature Correlation.
Our benchmark highlights accuracy gain with model zoo compared to fine-tuning Imagenet models.
arXiv Detail & Related papers (2021-01-29T21:57:15Z) - Online Active Model Selection for Pre-trained Classifiers [72.84853880948894]
We design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round.
Our algorithm can be used for online prediction tasks for both adversarial and streams.
arXiv Detail & Related papers (2020-10-19T19:53:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.