Active Testing: Sample-Efficient Model Evaluation
- URL: http://arxiv.org/abs/2103.05331v1
- Date: Tue, 9 Mar 2021 10:20:49 GMT
- Title: Active Testing: Sample-Efficient Model Evaluation
- Authors: Jannik Kossen, Sebastian Farquhar, Yarin Gal, Tom Rainforth
- Abstract summary: We introduce active testing: a new framework for sample-efficient model evaluation.
Active testing addresses this by carefully selecting the test points to label.
We show how to remove that bias while reducing the variance of the estimator.
- Score: 39.200332879659456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce active testing: a new framework for sample-efficient model
evaluation. While approaches like active learning reduce the number of labels
needed for model training, existing literature largely ignores the cost of
labeling test data, typically unrealistically assuming large test sets for
model evaluation. This creates a disconnect to real applications where test
labels are important and just as expensive, e.g. for optimizing
hyperparameters. Active testing addresses this by carefully selecting the test
points to label, ensuring model evaluation is sample-efficient. To this end, we
derive theoretically-grounded and intuitive acquisition strategies that are
specifically tailored to the goals of active testing, noting these are distinct
to those of active learning. Actively selecting labels introduces a bias; we
show how to remove that bias while reducing the variance of the estimator at
the same time. Active testing is easy to implement, effective, and can be
applied to any supervised machine learning method. We demonstrate this on
models including WideResNet and Gaussian processes on datasets including
CIFAR-100.
Related papers
- Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models [49.06068319380296]
We introduce context-aware testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures.
We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures.
arXiv Detail & Related papers (2024-10-31T15:06:16Z) - Realistic Evaluation of Test-Time Adaptation Algorithms: Unsupervised Hyperparameter Selection [1.4530711901349282]
Test-Time Adaptation (TTA) has emerged as a promising strategy for tackling the problem of machine learning model robustness under distribution shifts.
We evaluate existing TTA methods using surrogate-based hp-selection strategies to obtain a more realistic evaluation of their performance.
arXiv Detail & Related papers (2024-07-19T11:58:30Z) - Model Uncertainty based Active Learning on Tabular Data using Boosted
Trees [0.4667030429896303]
Supervised machine learning relies on the availability of good labelled data for model training.
Active learning is a sub-field of machine learning which helps in obtaining the labelled data efficiently.
arXiv Detail & Related papers (2023-10-30T14:29:53Z) - Active Learning with Combinatorial Coverage [0.0]
Active learning is a practical field of machine learning that automates the process of selecting which data to label.
Current methods are effective in reducing the burden of data labeling but are heavily model-reliant.
This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias.
We propose active learning methods utilizing coverage to overcome these issues.
arXiv Detail & Related papers (2023-02-28T13:43:23Z) - Rethinking Precision of Pseudo Label: Test-Time Adaptation via
Complementary Learning [10.396596055773012]
We propose a novel complementary learning approach to enhance test-time adaptation.
In test-time adaptation tasks, information from the source domain is typically unavailable.
We highlight that the risk function of complementary labels agrees with their Vanilla loss formula.
arXiv Detail & Related papers (2023-01-15T03:36:33Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.
Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z) - Active Surrogate Estimators: An Active Learning Approach to
Label-Efficient Model Evaluation [59.7305309038676]
We propose Active Surrogate Estimators (ASEs) for model evaluation.
We find that ASEs offer greater label-efficiency than the current state-of-the-art.
arXiv Detail & Related papers (2022-02-14T17:15:18Z) - TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning
Tasks [14.547623982073475]
Deep learning systems are notoriously difficult to test and debug.
It is essential to conduct test selection and label only those selected "high quality" bug-revealing test inputs for test cost reduction.
We propose a novel test prioritization technique that brings order into the unlabeled test instances according to their bug-revealing capabilities, namely TestRank.
arXiv Detail & Related papers (2021-05-21T03:41:10Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.