Related papers: Limits to classification performance by relating Kullback-Leibler divergence to Cohen's Kappa

Limits to classification performance by relating Kullback-Leibler divergence to Cohen's Kappa

URL: http://arxiv.org/abs/2403.01571v1
Date: Sun, 3 Mar 2024 17:36:42 GMT
Title: Limits to classification performance by relating Kullback-Leibler divergence to Cohen's Kappa
Authors: L. Crow and S. J. Watts
Abstract summary: Theory and methods are discussed in detail and then applied to Monte Carlo data and real datasets. In all cases this analysis shows that the algorithms could not have performed any better due to the underlying probability density functions for the two classes.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance of machine learning classification algorithms are evaluated by estimating metrics, often from the confusion matrix, using training data and cross-validation. However, these do not prove that the best possible performance has been achieved. Fundamental limits to error rates can be estimated using information distance measures. To this end, the confusion matrix has been formulated to comply with the Chernoff-Stein Lemma. This links the error rates to the Kullback-Leibler divergences between the probability density functions describing the two classes. This leads to a key result that relates Cohen's Kappa to the Resistor Average Distance which is the parallel resistor combination of the two Kullback-Leibler divergences. The Resistor Average Distance has units of bits and is estimated from the same training data used by the classification algorithm, using kNN estimates of the KullBack-Leibler divergences. The classification algorithm gives the confusion matrix and Kappa. Theory and methods are discussed in detail and then applied to Monte Carlo data and real datasets. Four very different real datasets - Breast Cancer, Coronary Heart Disease, Bankruptcy, and Particle Identification - are analysed, with both continuous and discrete values, and their classification performance compared to the expected theoretical limit. In all cases this analysis shows that the algorithms could not have performed any better due to the underlying probability density functions for the two classes. Important lessons are learnt on how to predict the performance of algorithms for imbalanced data using training datasets that are approximately balanced. Machine learning is very powerful but classification performance ultimately depends on the quality of the data and the relevance of the variables to the problem.

Related papers

Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores. We flexibly estimate the joint cumulative distribution function (CDF) of the scores. Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
Risk-based Calibration for Generative Classifiers [4.792851066169872]
We propose a learning procedure called risk-based calibration (RC) RC iteratively refines the generative classifier by adjusting its joint probability distribution according to the 0-1 loss in training samples. RC significantly outperforms closed-form learning procedures in terms of both training error and generalization error.
arXiv Detail & Related papers (2024-09-05T14:06:56Z)
Is K-fold cross validation the best model selection method for Machine Learning? [0.0]
K-fold cross-validation (CV) is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance. A novel statistical test based on K-fold CV and the Upper Bound of the actual risk (K-fold CUBV) is proposed.
arXiv Detail & Related papers (2024-01-29T18:46:53Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Utilizing Class Separation Distance for the Evaluation of Corruption Robustness of Machine Learning Classifiers [0.6882042556551611]
We propose a test data augmentation method that uses a robustness distance $epsilon$ derived from the datasets minimal class separation distance. The resulting MSCR metric allows a dataset-specific comparison of different classifiers with respect to their corruption robustness. Our results indicate that robustness training through simple data augmentation can already slightly improve accuracy.
arXiv Detail & Related papers (2022-06-27T15:56:16Z)
Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset. Many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z)
Error Scaling Laws for Kernel Classification under Source and Capacity Conditions [26.558090928198187]
We consider the important class of data sets satisfying the standard source and capacity conditions. We derive the decay rates for the misclassification (prediction) error as a function of the source and capacity coefficients. Our results can be seen as an explicit prediction of the exponents of a scaling law for kernel classification.
arXiv Detail & Related papers (2022-01-29T20:39:58Z)
Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms. The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm. As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z)
Regularized Classification-Aware Quantization [39.04839665081476]
We present a class of algorithms that learn distributed quantization schemes for binary classification tasks. Our method is called Regularized Classification-Aware Quantization.
arXiv Detail & Related papers (2021-07-12T21:27:48Z)
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification. Our analysis reveals that the classification accuracy is highly distribution-dependent. The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z)
Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and Self-Control Gradient Estimator [62.26981903551382]
Variational auto-encoders (VAEs) with binary latent variables provide state-of-the-art performance in terms of precision for document retrieval. We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing. This new semantic hashing framework achieves superior performance compared to the state-of-the-arts.
arXiv Detail & Related papers (2020-05-21T06:11:33Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.