Naive Bayes Classifiers and One-hot Encoding of Categorical Variables
- URL: http://arxiv.org/abs/2404.18190v1
- Date: Sun, 28 Apr 2024 14:04:58 GMT
- Title: Naive Bayes Classifiers and One-hot Encoding of Categorical Variables
- Authors: Christopher K. I. Williams,
- Abstract summary: We investigate the consequences of encoding a $K$-valued categorical variable incorrectly as $K$ bits via one-hot encoding.
This gives rise to a product-of-Bernoullis (PoB) assumption, rather than the correct categorical Na"ive Bayes classifier.
- Score: 4.5053219193867395
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the consequences of encoding a $K$-valued categorical variable incorrectly as $K$ bits via one-hot encoding, when using a Na\"{\i}ve Bayes classifier. This gives rise to a product-of-Bernoullis (PoB) assumption, rather than the correct categorical Na\"{\i}ve Bayes classifier. The differences between the two classifiers are analysed mathematically and experimentally. In our experiments using probability vectors drawn from a Dirichlet distribution, the two classifiers are found to agree on the maximum a posteriori class label for most cases, although the posterior probabilities are usually greater for the PoB case.
Related papers
- Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier [0.0]
We supervised classification for datasets with a very large number of input variables.
We propose a regularization of the model log-like Baylihood.
The various proposed algorithms result in optimization-based weighted na"ivees scheme.
arXiv Detail & Related papers (2024-09-17T11:54:14Z) - Generating Unbiased Pseudo-labels via a Theoretically Guaranteed
Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label.
In nature, regression also requires unbiased methods to generate high-quality labels.
We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z) - The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing [85.85160896547698]
Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks.
We show how to design an efficient classifier with a certified radius by relying on noise injection into the inputs.
Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.
arXiv Detail & Related papers (2023-09-28T22:41:47Z) - Revisiting Discriminative vs. Generative Classifiers: Theory and
Implications [37.98169487351508]
This paper is inspired by the statistical efficiency of naive Bayes.
We present a multiclass $mathcalH$-consistency bound framework and an explicit bound for logistic loss.
Experiments on various pre-trained deep vision models show that naive Bayes consistently converges faster as the number of data increases.
arXiv Detail & Related papers (2023-02-05T08:30:42Z) - Deriving discriminative classifiers from generative models [6.939768185086753]
We show how a generative classifier induced from a generative model can also be computed in a discriminative way from the same model.
We illustrate the interest of the new discriminative way of computing classifiers in the Natural Language Processing (NLP) framework.
arXiv Detail & Related papers (2022-01-03T19:18:25Z) - Classification Under Ambiguity: When Is Average-K Better Than Top-K? [1.7156052308952854]
A common alternative, referred to as top-$K$ classification, is to choose some number $K$ and to return the $K$ labels with the highest scores.
This paper formally characterizes the ambiguity profile when average-$K$ classification can achieve a lower error rate than a fixed top-$K$ classification.
arXiv Detail & Related papers (2021-12-16T12:58:07Z) - On the rate of convergence of a classifier based on a Transformer
encoder [55.41148606254641]
The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed.
It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability satisfies a suitable hierarchical composition model.
arXiv Detail & Related papers (2021-11-29T14:58:29Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - Binary classification with ambiguous training data [69.50862982117127]
In supervised learning, we often face with ambiguous (A) samples that are difficult to label even by domain experts.
This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples.
arXiv Detail & Related papers (2020-11-05T00:53:58Z) - Quantifying the Uncertainty of Precision Estimates for Rule based Text
Classifiers [0.0]
Rule based classifiers that use the presence and absence of key sub-strings to make classification decisions have a natural mechanism for quantifying the uncertainty of their precision.
For a binary classifier, the key insight is to treat partitions of the sub-string set induced by the documents as Bernoulli random variables.
The utility of this approach is demonstrated with a benchmark problem.
arXiv Detail & Related papers (2020-05-19T03:51:47Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.