Active Surrogate Estimators: An Active Learning Approach to
Label-Efficient Model Evaluation
- URL: http://arxiv.org/abs/2202.06881v1
- Date: Mon, 14 Feb 2022 17:15:18 GMT
- Title: Active Surrogate Estimators: An Active Learning Approach to
Label-Efficient Model Evaluation
- Authors: Jannik Kossen, Sebastian Farquhar, Yarin Gal, Tom Rainforth
- Abstract summary: We propose Active Surrogate Estimators (ASEs) for model evaluation.
We find that ASEs offer greater label-efficiency than the current state-of-the-art.
- Score: 59.7305309038676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Active Surrogate Estimators (ASEs), a new method for
label-efficient model evaluation. Evaluating model performance is a challenging
and important problem when labels are expensive. ASEs address this active
testing problem using a surrogate-based estimation approach, whereas previous
methods have focused on Monte Carlo estimates. ASEs actively learn the
underlying surrogate, and we propose a novel acquisition strategy, XWING, that
tailors this learning to the final estimation task. We find that ASEs offer
greater label-efficiency than the current state-of-the-art when applied to
challenging model evaluation problems for deep neural networks. We further
theoretically analyze ASEs' errors.
Related papers
- Deep Bayesian Active Learning for Preference Modeling in Large Language Models [84.817400962262]
We propose the Bayesian Active Learner for Preference Modeling (BAL-PM) for Preference Modeling.
BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous Bayesian acquisition policies.
Our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous Bayesian acquisition policies.
arXiv Detail & Related papers (2024-06-14T13:32:43Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Learning Objective-Specific Active Learning Strategies with Attentive
Neural Processes [72.75421975804132]
Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting.
We propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem.
Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives.
arXiv Detail & Related papers (2023-09-11T14:16:37Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - NTKCPL: Active Learning on Top of Self-Supervised Model by Estimating
True Coverage [3.4806267677524896]
We propose a novel active learning strategy, neural tangent kernel clustering-pseudo-labels (NTKCPL)
It estimates empirical risk based on pseudo-labels and the model prediction with NTK approximation.
We validate our method on five datasets, empirically demonstrating that it outperforms the baseline methods in most cases.
arXiv Detail & Related papers (2023-06-07T01:43:47Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Self-supervised Assisted Active Learning for Skin Lesion Segmentation [18.78959113954792]
Label scarcity has been a long-standing issue for biomedical image segmentation, due to high annotation costs and professional requirements.
We propose a novel self-supervised assisted active learning framework in the cold-start setting, in which the segmentation model is first warmed up with self-supervised learning.
Our approach is capable of achieving promising performance with substantial improvements over existing baselines.
arXiv Detail & Related papers (2022-05-14T09:40:18Z) - WSLRec: Weakly Supervised Learning for Neural Sequential Recommendation
Models [24.455665093145818]
We propose a novel model-agnostic training approach called WSLRec, which adopts a three-stage framework: pre-training, top-$k$ mining, intrinsic and fine-tuning.
WSLRec resolves the incompleteness problem by pre-training models on extra weak supervisions from model-free methods like BR and ItemCF, while resolving the inaccuracy problem by leveraging the top-$k$ mining to screen out reliable user-item relevance from weak supervisions for fine-tuning.
arXiv Detail & Related papers (2022-02-28T08:55:12Z) - Active Testing: Sample-Efficient Model Evaluation [39.200332879659456]
We introduce active testing: a new framework for sample-efficient model evaluation.
Active testing addresses this by carefully selecting the test points to label.
We show how to remove that bias while reducing the variance of the estimator.
arXiv Detail & Related papers (2021-03-09T10:20:49Z) - Active Feature Acquisition with Generative Surrogate Models [11.655069211977464]
In this work, we consider models that perform active feature acquisition (AFA) and query the environment for unobserved features.
Our work reformulates the Markov decision process (MDP) that underlies the AFA problem as a generative modeling task.
We propose learning a generative surrogate model ( GSM) that captures the dependencies among input features to assess potential information gain from acquisitions.
arXiv Detail & Related papers (2020-10-06T02:10:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.