Quantifying the Uncertainty of Precision Estimates for Rule based Text
Classifiers
- URL: http://arxiv.org/abs/2005.09198v1
- Date: Tue, 19 May 2020 03:51:47 GMT
- Title: Quantifying the Uncertainty of Precision Estimates for Rule based Text
Classifiers
- Authors: James Nutaro and Ozgur Ozmen
- Abstract summary: Rule based classifiers that use the presence and absence of key sub-strings to make classification decisions have a natural mechanism for quantifying the uncertainty of their precision.
For a binary classifier, the key insight is to treat partitions of the sub-string set induced by the documents as Bernoulli random variables.
The utility of this approach is demonstrated with a benchmark problem.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rule based classifiers that use the presence and absence of key sub-strings
to make classification decisions have a natural mechanism for quantifying the
uncertainty of their precision. For a binary classifier, the key insight is to
treat partitions of the sub-string set induced by the documents as Bernoulli
random variables. The mean value of each random variable is an estimate of the
classifier's precision when presented with a document inducing that partition.
These means can be compared, using standard statistical tests, to a desired or
expected classifier precision. A set of binary classifiers can be combined into
a single, multi-label classifier by an application of the Dempster-Shafer
theory of evidence. The utility of this approach is demonstrated with a
benchmark problem.
Related papers
- The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing [85.85160896547698]
Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks.
We show how to design an efficient classifier with a certified radius by relying on noise injection into the inputs.
Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.
arXiv Detail & Related papers (2023-09-28T22:41:47Z) - Mitigating Word Bias in Zero-shot Prompt-based Classifiers [55.60306377044225]
We show that matching class priors correlates strongly with the oracle upper bound performance.
We also demonstrate large consistent performance gains for prompt settings over a range of NLP tasks.
arXiv Detail & Related papers (2023-09-10T10:57:41Z) - An Upper Bound for the Distribution Overlap Index and Its Applications [18.481370450591317]
This paper proposes an easy-to-compute upper bound for the overlap index between two probability distributions.
The proposed bound shows its value in one-class classification and domain shift analysis.
Our work shows significant promise toward broadening the applications of overlap-based metrics.
arXiv Detail & Related papers (2022-12-16T20:02:03Z) - Determination of class-specific variables in nonparametric
multiple-class classification [0.0]
We propose a probability-based nonparametric multiple-class classification method, and integrate it with the ability of identifying high impact variables for individual class.
We report the properties of the proposed method, and use both synthesized and real data sets to illustrate its properties under different classification situations.
arXiv Detail & Related papers (2022-05-07T10:08:58Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - On the rate of convergence of a classifier based on a Transformer
encoder [55.41148606254641]
The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed.
It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability satisfies a suitable hierarchical composition model.
arXiv Detail & Related papers (2021-11-29T14:58:29Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - Specialists Outperform Generalists in Ensemble Classification [15.315432841707736]
In this paper, we address the question of whether we can determine the accuracy of the ensemble.
We explicitly construct the individual classifiers that attain the upper and lower bounds: specialists and generalists.
arXiv Detail & Related papers (2021-07-09T12:16:10Z) - Classifier uncertainty: evidence, potential impact, and probabilistic
treatment [0.0]
We present an approach to quantify the uncertainty of classification performance metrics based on a probability model of the confusion matrix.
We show that uncertainties can be surprisingly large and limit performance evaluation.
arXiv Detail & Related papers (2020-06-19T12:49:19Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.