Related papers: Active Bayesian Assessment for Black-Box Classifiers

Active Bayesian Assessment for Black-Box Classifiers

URL: http://arxiv.org/abs/2002.06532v3
Date: Mon, 15 Mar 2021 16:21:55 GMT
Title: Active Bayesian Assessment for Black-Box Classifiers
Authors: Disi Ji, Robert L. Logan IV, Padhraic Smyth, Mark Steyvers
Abstract summary: We introduce an active Bayesian approach for assessment of classifier performance to satisfy the desiderata of both reliability and label-efficiency. We first develop inference strategies to quantify uncertainty for common assessment metrics such as accuracy, misclassification cost, and calibration error. We then propose a general framework for active Bayesian assessment using inferred uncertainty to guide efficient selection of instances for labeling.
Score: 20.668691047355072
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a critical need to both reliably assess the performance of these pre-trained models and to perform this assessment in a label-efficient manner (given that labels may be scarce and costly to collect). In this paper, we introduce an active Bayesian approach for assessment of classifier performance to satisfy the desiderata of both reliability and label-efficiency. We begin by developing inference strategies to quantify uncertainty for common assessment metrics such as accuracy, misclassification cost, and calibration error. We then propose a general framework for active Bayesian assessment using inferred uncertainty to guide efficient selection of instances for labeling, enabling better performance assessment with fewer labels. We demonstrate significant gains from our proposed active Bayesian approach via a series of systematic empirical experiments assessing the performance of modern neural classifiers (e.g., ResNet and BERT) on several standard image and text classification datasets.

Related papers

Fighting Sampling Bias: A Framework for Training and Evaluating Credit Scoring Models [2.918530881730374]
This paper addresses the adverse effect of sampling bias on model training and evaluation. We propose bias-aware self-learning and a reject inference framework for scorecard evaluation. Our results suggest a profit improvement of about eight percent, when using Bayesian evaluation to decide on acceptance rates.
arXiv Detail & Related papers (2024-07-17T20:59:54Z)
Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration [60.95748658638956]
This paper introduces the Multi-Label Confidence task, aiming to provide well-calibrated confidence scores in multi-label scenarios. Existing single-label calibration methods fail to account for category correlations, which are crucial for addressing semantic confusion. We propose the Dynamic Correlation Learning and Regularization algorithm, which leverages multi-grained semantic correlations to better model semantic confusion.
arXiv Detail & Related papers (2024-07-09T13:26:21Z)
Data-Driven Estimation of the False Positive Rate of the Bayes Binary Classifier via Soft Labels [25.40796153743837]
We propose an estimator for the false positive rate (FPR) of the Bayes classifier, that is, the optimal classifier with respect to accuracy, from a given dataset. We develop effective FPR estimators by leveraging a denoising technique and the Nadaraya-Watson estimator.
arXiv Detail & Related papers (2024-01-27T20:41:55Z)
Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning [6.704927458661697]
Expected Loss Reduction (ELR) focuses on a Bayesian estimate of the reduction in classification error, and more general costs fit in the same framework. We propose Bayesian Estimate of Mean Proper Scores (BEMPS) to estimate the increase in strictly proper scores. We show that BEMPS yields robust acquisition functions and well-calibrated classifiers, and consistently outperforms the others tested.
arXiv Detail & Related papers (2023-12-15T11:02:17Z)
Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention. An increasing number of transfer-based methods have been developed to fool black-box DNN models. We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting [41.96570350954332]
We propose an algorithm that improves the fairness of a pre-trained classifier by simply dropping carefully selected training data points. We find that such an intervention does not substantially reduce the predictive performance of the model but drastically improves the fairness metric.
arXiv Detail & Related papers (2022-12-13T18:36:19Z)
Evaluating the Predictive Performance of Positive-Unlabelled Classifiers: a brief critical review and practical recommendations for improvement [77.34726150561087]
Positive-Unlabelled (PU) learning is a growing area of machine learning. This paper critically reviews the main PU learning evaluation approaches and the choice of predictive accuracy measures in 51 articles proposing PU classifiers.
arXiv Detail & Related papers (2022-06-06T08:31:49Z)
Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation [59.7305309038676]
We propose Active Surrogate Estimators (ASEs) for model evaluation. We find that ASEs offer greater label-efficiency than the current state-of-the-art.
arXiv Detail & Related papers (2022-02-14T17:15:18Z)
Ask-n-Learn: Active Learning via Reliable Gradient Representations for Image Classification [29.43017692274488]
Deep predictive models rely on human supervision in the form of labeled training data. We propose Ask-n-Learn, an active learning approach based on gradient embeddings obtained using the pesudo-labels estimated in each of the algorithm.
arXiv Detail & Related papers (2020-09-30T05:19:56Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.