Active Bayesian Assessment for Black-Box Classifiers
- URL: http://arxiv.org/abs/2002.06532v3
- Date: Mon, 15 Mar 2021 16:21:55 GMT
- Title: Active Bayesian Assessment for Black-Box Classifiers
- Authors: Disi Ji, Robert L. Logan IV, Padhraic Smyth, Mark Steyvers
- Abstract summary: We introduce an active Bayesian approach for assessment of classifier performance to satisfy the desiderata of both reliability and label-efficiency.
We first develop inference strategies to quantify uncertainty for common assessment metrics such as accuracy, misclassification cost, and calibration error.
We then propose a general framework for active Bayesian assessment using inferred uncertainty to guide efficient selection of instances for labeling.
- Score: 20.668691047355072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in machine learning have led to increased deployment of
black-box classifiers across a wide variety of applications. In many such
situations there is a critical need to both reliably assess the performance of
these pre-trained models and to perform this assessment in a label-efficient
manner (given that labels may be scarce and costly to collect). In this paper,
we introduce an active Bayesian approach for assessment of classifier
performance to satisfy the desiderata of both reliability and label-efficiency.
We begin by developing inference strategies to quantify uncertainty for common
assessment metrics such as accuracy, misclassification cost, and calibration
error. We then propose a general framework for active Bayesian assessment using
inferred uncertainty to guide efficient selection of instances for labeling,
enabling better performance assessment with fewer labels. We demonstrate
significant gains from our proposed active Bayesian approach via a series of
systematic empirical experiments assessing the performance of modern neural
classifiers (e.g., ResNet and BERT) on several standard image and text
classification datasets.
Related papers
- Fighting Sampling Bias: A Framework for Training and Evaluating Credit Scoring Models [2.918530881730374]
This paper addresses the adverse effect of sampling bias on model training and evaluation.
We propose bias-aware self-learning and a reject inference framework for scorecard evaluation.
Our results suggest a profit improvement of about eight percent, when using Bayesian evaluation to decide on acceptance rates.
arXiv Detail & Related papers (2024-07-17T20:59:54Z) - Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration [60.95748658638956]
This paper introduces the Multi-Label Confidence task, aiming to provide well-calibrated confidence scores in multi-label scenarios.
Existing single-label calibration methods fail to account for category correlations, which are crucial for addressing semantic confusion.
We propose the Dynamic Correlation Learning and Regularization algorithm, which leverages multi-grained semantic correlations to better model semantic confusion.
arXiv Detail & Related papers (2024-07-09T13:26:21Z) - Data-Driven Estimation of the False Positive Rate of the Bayes Binary
Classifier via Soft Labels [25.40796153743837]
We propose an estimator for the false positive rate (FPR) of the Bayes classifier, that is, the optimal classifier with respect to accuracy, from a given dataset.
We develop effective FPR estimators by leveraging a denoising technique and the Nadaraya-Watson estimator.
arXiv Detail & Related papers (2024-01-27T20:41:55Z) - Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active
Learning [6.704927458661697]
Expected Loss Reduction (ELR) focuses on a Bayesian estimate of the reduction in classification error, and more general costs fit in the same framework.
We propose Bayesian Estimate of Mean Proper Scores (BEMPS) to estimate the increase in strictly proper scores.
We show that BEMPS yields robust acquisition functions and well-calibrated classifiers, and consistently outperforms the others tested.
arXiv Detail & Related papers (2023-12-15T11:02:17Z) - Towards Evaluating Transfer-based Attacks Systematically, Practically,
and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention.
An increasing number of transfer-based methods have been developed to fool black-box DNN models.
We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Fair Infinitesimal Jackknife: Mitigating the Influence of Biased
Training Data Points Without Refitting [41.96570350954332]
We propose an algorithm that improves the fairness of a pre-trained classifier by simply dropping carefully selected training data points.
We find that such an intervention does not substantially reduce the predictive performance of the model but drastically improves the fairness metric.
arXiv Detail & Related papers (2022-12-13T18:36:19Z) - Evaluating the Predictive Performance of Positive-Unlabelled
Classifiers: a brief critical review and practical recommendations for
improvement [77.34726150561087]
Positive-Unlabelled (PU) learning is a growing area of machine learning.
This paper critically reviews the main PU learning evaluation approaches and the choice of predictive accuracy measures in 51 articles proposing PU classifiers.
arXiv Detail & Related papers (2022-06-06T08:31:49Z) - Active Surrogate Estimators: An Active Learning Approach to
Label-Efficient Model Evaluation [59.7305309038676]
We propose Active Surrogate Estimators (ASEs) for model evaluation.
We find that ASEs offer greater label-efficiency than the current state-of-the-art.
arXiv Detail & Related papers (2022-02-14T17:15:18Z) - Ask-n-Learn: Active Learning via Reliable Gradient Representations for
Image Classification [29.43017692274488]
Deep predictive models rely on human supervision in the form of labeled training data.
We propose Ask-n-Learn, an active learning approach based on gradient embeddings obtained using the pesudo-labels estimated in each of the algorithm.
arXiv Detail & Related papers (2020-09-30T05:19:56Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.