Limits to classification performance by relating Kullback-Leibler
divergence to Cohen's Kappa
- URL: http://arxiv.org/abs/2403.01571v1
- Date: Sun, 3 Mar 2024 17:36:42 GMT
- Title: Limits to classification performance by relating Kullback-Leibler
divergence to Cohen's Kappa
- Authors: L. Crow and S. J. Watts
- Abstract summary: Theory and methods are discussed in detail and then applied to Monte Carlo data and real datasets.
In all cases this analysis shows that the algorithms could not have performed any better due to the underlying probability density functions for the two classes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The performance of machine learning classification algorithms are evaluated
by estimating metrics, often from the confusion matrix, using training data and
cross-validation. However, these do not prove that the best possible
performance has been achieved. Fundamental limits to error rates can be
estimated using information distance measures. To this end, the confusion
matrix has been formulated to comply with the Chernoff-Stein Lemma. This links
the error rates to the Kullback-Leibler divergences between the probability
density functions describing the two classes. This leads to a key result that
relates Cohen's Kappa to the Resistor Average Distance which is the parallel
resistor combination of the two Kullback-Leibler divergences. The Resistor
Average Distance has units of bits and is estimated from the same training data
used by the classification algorithm, using kNN estimates of the
KullBack-Leibler divergences. The classification algorithm gives the confusion
matrix and Kappa. Theory and methods are discussed in detail and then applied
to Monte Carlo data and real datasets. Four very different real datasets -
Breast Cancer, Coronary Heart Disease, Bankruptcy, and Particle Identification
- are analysed, with both continuous and discrete values, and their
classification performance compared to the expected theoretical limit. In all
cases this analysis shows that the algorithms could not have performed any
better due to the underlying probability density functions for the two classes.
Important lessons are learnt on how to predict the performance of algorithms
for imbalanced data using training datasets that are approximately balanced.
Machine learning is very powerful but classification performance ultimately
depends on the quality of the data and the relevance of the variables to the
problem.
Related papers
- Is K-fold cross validation the best model selection method for Machine
Learning? [0.0]
K-fold cross-validation is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance.
A novel test based on K-fold CV and the Upper Bound of the actual error (K-fold CUBV) is composed.
arXiv Detail & Related papers (2024-01-29T18:46:53Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Utilizing Class Separation Distance for the Evaluation of Corruption
Robustness of Machine Learning Classifiers [0.6882042556551611]
We propose a test data augmentation method that uses a robustness distance $epsilon$ derived from the datasets minimal class separation distance.
The resulting MSCR metric allows a dataset-specific comparison of different classifiers with respect to their corruption robustness.
Our results indicate that robustness training through simple data augmentation can already slightly improve accuracy.
arXiv Detail & Related papers (2022-06-27T15:56:16Z) - Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset.
Many approximation methods have been developed, which inevitably introduce approximation error.
This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior.
We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z) - Error Scaling Laws for Kernel Classification under Source and Capacity
Conditions [26.558090928198187]
We consider the important class of data sets satisfying the standard source and capacity conditions.
We derive the decay rates for the misclassification (prediction) error as a function of the source and capacity coefficients.
Our results can be seen as an explicit prediction of the exponents of a scaling law for kernel classification.
arXiv Detail & Related papers (2022-01-29T20:39:58Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Regularized Classification-Aware Quantization [39.04839665081476]
We present a class of algorithms that learn distributed quantization schemes for binary classification tasks.
Our method is called Regularized Classification-Aware Quantization.
arXiv Detail & Related papers (2021-07-12T21:27:48Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and
Self-Control Gradient Estimator [62.26981903551382]
Variational auto-encoders (VAEs) with binary latent variables provide state-of-the-art performance in terms of precision for document retrieval.
We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing.
This new semantic hashing framework achieves superior performance compared to the state-of-the-arts.
arXiv Detail & Related papers (2020-05-21T06:11:33Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.