Related papers: Block-regularized 5$\times$2 Cross-validated McNemar's Test for Comparing Two Classification Algorithms

Block-regularized 5$\times$2 Cross-validated McNemar's Test for Comparing Two Classification Algorithms

URL: http://arxiv.org/abs/2304.03990v1
Date: Sat, 8 Apr 2023 11:35:19 GMT
Title: Block-regularized 5$\times$2 Cross-validated McNemar's Test for Comparing Two Classification Algorithms
Authors: Ruibo Wang and Jihong Li
Abstract summary: Cross-validation method repeats the HO method in multiple times and produces a stable estimation. A block-regularized 5$times$2 CV (BCV) has been shown in many previous studies to be superior to the other CV methods. We demonstrate the reasonable type I error and the promising power of the proposed 5$times$2 BCV McNemar's test on simulated and real-world data sets.
Score: 5.7490445900906835
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the task of comparing two classification algorithms, the widely-used McNemar's test aims to infer the presence of a significant difference between the error rates of the two classification algorithms. However, the power of the conventional McNemar's test is usually unpromising because the hold-out (HO) method in the test merely uses a single train-validation split that usually produces a highly varied estimation of the error rates. In contrast, a cross-validation (CV) method repeats the HO method in multiple times and produces a stable estimation. Therefore, a CV method has a great advantage to improve the power of McNemar's test. Among all types of CV methods, a block-regularized 5$\times$2 CV (BCV) has been shown in many previous studies to be superior to the other CV methods in the comparison task of algorithms because the 5$\times$2 BCV can produce a high-quality estimator of the error rate by regularizing the numbers of overlapping records between all training sets. In this study, we compress the 10 correlated contingency tables in the 5$\times$2 BCV to form an effective contingency table. Then, we define a 5$\times$2 BCV McNemar's test on the basis of the effective contingency table. We demonstrate the reasonable type I error and the promising power of the proposed 5$\times$2 BCV McNemar's test on multiple simulated and real-world data sets.

Related papers

The Relative Instability of Model Comparison with Cross-validation [65.90853456199493]
Cross-validation can be used to provide a confidence interval for the test error of a stable machine learning algorithm.<n>Relative stability cannot easily be derived from existing stability results, even for simple algorithms.<n>We empirically confirm the invalidity of CV confidence intervals for the test error difference when either soft-thresholding or the Lasso is used.
arXiv Detail & Related papers (2025-08-06T12:54:56Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences. Exact methods yield better classification performance, but they pose high computational costs. We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
arXiv Detail & Related papers (2022-09-11T22:44:19Z)
Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition [80.07843757970923]
We show that existing OOD detection methods suffer from significant performance degradation when the training set is long-tail distributed. We propose Partial and Asymmetric Supervised Contrastive Learning (PASCL), which explicitly encourages the model to distinguish between tail-class in-distribution samples and OOD samples. Our method outperforms previous state-of-the-art method by $1.29%$, $1.45%$, $0.69%$ anomaly detection false positive rate (FPR) and $3.24%$, $4.06%$, $7.89%$ in-distribution
arXiv Detail & Related papers (2022-07-04T01:53:07Z)
Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking [57.42241521034744]
We propose the concept of certified error control of candidate set pruning for relevance ranking. Our method successfully prunes the first-stage retrieved candidate sets to improve the second-stage reranking speed.
arXiv Detail & Related papers (2022-05-19T16:00:13Z)
Confidence intervals for the Cox model test error from cross-validation [91.3755431537592]
Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model. Standard confidence intervals for test error using estimates from CV may have coverage below nominal levels. One way to this issue is by estimating the mean squared error of the prediction error instead using nested CV.
arXiv Detail & Related papers (2022-01-26T06:40:43Z)
Fast and Informative Model Selection using Learning Curve Cross-Validation [2.28438857884398]
Cross-validation methods can be unnecessarily slow on large datasets. We present a new approach for validation based on learning curves (LCCV) LCCV iteratively increases the number of instances used for training.
arXiv Detail & Related papers (2021-11-27T14:48:52Z)
Deep Learning in current Neuroimaging: a multivariate approach with power and type I error control but arguable generalization ability [0.158310730488265]
A non-parametric framework is proposed that estimates the statistical significance of classifications using deep learning architectures. A label permutation test is proposed in both studies using cross-validation (CV) and resubstitution with upper bound correction (RUB) as validation methods. We found in the permutation test that CV and RUB methods offer a false positive rate close to the significance level and an acceptable statistical power.
arXiv Detail & Related papers (2021-03-30T21:15:39Z)
Approximate Cross-Validation with Low-Rank Data in High Dimensions [35.74302895575951]
Cross-validation is an important tool for model assessment. ACV methods can lose both speed and accuracy in high dimensions unless sparsity structure is present in the data. We develop a new algorithm for ACV that is fast and accurate in the presence of ALR data.
arXiv Detail & Related papers (2020-08-24T16:34:05Z)
Rademacher upper bounds for cross-validation errors with an application to the lasso [6.837167110907022]
We establish a general upper bound for $K$-fold cross-validation ($K$-CV) errors. The CV error upper bound applies to both light-tail and heavy-tail error distributions. We provide a Python package for computing the CV error upper bound in $K$-CV-based algorithms.
arXiv Detail & Related papers (2020-07-30T17:13:03Z)
Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm. Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z)
Multi-label Contrastive Predictive Coding [125.03510235962095]
Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC) We introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify multiple positive samples at the same time. We show that using the same amount of negative samples, multi-label CPC is able to exceed the $log m$ bound, while still being a valid lower bound of mutual information.
arXiv Detail & Related papers (2020-07-20T02:46:21Z)
Solving for multi-class using orthogonal coding matrices [0.0]
Error correcting code (ECC) is a common method of generalizing binary to multi-class classification. Here we test two types of orthogonal ECCs on seven different datasets. We compare them with three other multi-class methods: 1 vs. 1, one-versus-the-rest and random ECCs.
arXiv Detail & Related papers (2018-01-27T08:45:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.