Streaming algorithms for evaluating noisy judges on unlabeled data --
binary classification
- URL: http://arxiv.org/abs/2306.01726v3
- Date: Fri, 8 Sep 2023 14:56:36 GMT
- Title: Streaming algorithms for evaluating noisy judges on unlabeled data --
binary classification
- Authors: Andr\'es Corrada-Emmanuel
- Abstract summary: We search for nearly error independent trios by using the algebraic failure modes to reject evaluation ensembles as too correlated.
The results produced by the surviving ensembles can sometimes be as good as 1%.
A Taylor expansion of the estimates produced when independence is assumed but the classifiers are, in fact, slightly correlated helps clarify how the independent evaluator has algebraic blind spots'
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The evaluation of noisy binary classifiers on unlabeled data is treated as a
streaming task: given a data sketch of the decisions by an ensemble, estimate
the true prevalence of the labels as well as each classifier's accuracy on
them. Two fully algebraic evaluators are constructed to do this. Both are based
on the assumption that the classifiers make independent errors. The first is
based on majority voting. The second, the main contribution of the paper, is
guaranteed to be correct. But how do we know the classifiers are independent on
any given test? This principal/agent monitoring paradox is ameliorated by
exploiting the failures of the independent evaluator to return sensible
estimates. A search for nearly error independent trios is empirically carried
out on the \texttt{adult}, \texttt{mushroom}, and \texttt{two-norm} datasets by
using the algebraic failure modes to reject evaluation ensembles as too
correlated. The searches are refined by constructing a surface in evaluation
space that contains the true value point. The algebra of arbitrarily correlated
classifiers permits the selection of a polynomial subset free of any
correlation variables. Candidate evaluation ensembles are rejected if their
data sketches produce independent estimates too far from the constructed
surface. The results produced by the surviving ensembles can sometimes be as
good as 1\%. But handling even small amounts of correlation remains a
challenge. A Taylor expansion of the estimates produced when independence is
assumed but the classifiers are, in fact, slightly correlated helps clarify how
the independent evaluator has algebraic `blind spots'.
Related papers
- Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - Counterfactually Comparing Abstaining Classifiers [37.43975777164451]
Abstaining classifiers have the option to abstain from making predictions on inputs that they are unsure about.
We introduce a novel approach to evaluating and comparing abstaining classifiers by treating abstentions as missing data.
arXiv Detail & Related papers (2023-05-17T20:46:57Z) - Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z) - Learning from Multiple Unlabeled Datasets with Partial Risk
Regularization [80.54710259664698]
In this paper, we aim to learn an accurate classifier without any class labels.
We first derive an unbiased estimator of the classification risk that can be estimated from the given unlabeled sets.
We then find that the classifier obtained as such tends to cause overfitting as its empirical risks go negative during training.
Experiments demonstrate that our method effectively mitigates overfitting and outperforms state-of-the-art methods for learning from multiple unlabeled sets.
arXiv Detail & Related papers (2022-07-04T16:22:44Z) - CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples.
CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling.
We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z) - Specialists Outperform Generalists in Ensemble Classification [15.315432841707736]
In this paper, we address the question of whether we can determine the accuracy of the ensemble.
We explicitly construct the individual classifiers that attain the upper and lower bounds: specialists and generalists.
arXiv Detail & Related papers (2021-07-09T12:16:10Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset.
We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z) - Evaluating Fairness of Machine Learning Models Under Uncertain and
Incomplete Information [25.739240011015923]
We show that the test accuracy of the attribute classifier is not always correlated with its effectiveness in bias estimation for a downstream model.
Our analysis has surprising and counter-intuitive implications where in certain regimes one might want to distribute the error of the attribute classifier as unevenly as possible.
arXiv Detail & Related papers (2021-02-16T19:02:55Z) - Verifying Individual Fairness in Machine Learning Models [4.29921861868687]
We consider the problem of whether a given decision model, working with structured data, has individual fairness.
Our objective is to construct verifiers for proving individual fairness of a given model, and we do so by considering appropriate relaxations of the problem.
arXiv Detail & Related papers (2020-06-21T08:37:54Z) - Classifier-independent Lower-Bounds for Adversarial Robustness [13.247278149124757]
We theoretically analyse the limits of robustness to test-time adversarial and noisy examples in classification.
We use optimal transport theory to derive variational formulae for the Bayes-optimal error a classifier can make on a given classification problem.
We derive explicit lower-bounds on the Bayes-optimal error in the case of the popular distance-based attacks.
arXiv Detail & Related papers (2020-06-17T16:46:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.