Pearson-Matthews correlation coefficients for binary and multinary
classification and hypothesis testing
- URL: http://arxiv.org/abs/2305.05974v1
- Date: Wed, 10 May 2023 08:32:36 GMT
- Title: Pearson-Matthews correlation coefficients for binary and multinary
classification and hypothesis testing
- Authors: Petre Stoica and Prabhu Babu
- Abstract summary: Multinary classification is the main focus of this paper.
We show that both $textR_textK$ and the MPC metrics suffer from the problem of not decisively indicating poor classification results when they should.
We also present an additional new metric for multinary classification which can be viewed as a direct extension of MCC.
- Score: 6.974999794070285
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The Pearson-Matthews correlation coefficient (usually abbreviated MCC) is
considered to be one of the most useful metrics for the performance of a binary
classification or hypothesis testing method (for the sake of conciseness we
will use the classification terminology throughout, but the concepts and
methods discussed in the paper apply verbatim to hypothesis testing as well).
For multinary classification tasks (with more than two classes) the existing
extension of MCC, commonly called the $\text{R}_{\text{K}}$ metric, has also
been successfully used in many applications. The present paper begins with an
introductory discussion on certain aspects of MCC. Then we go on to discuss the
topic of multinary classification that is the main focus of this paper and
which, despite its practical and theoretical importance, appears to be less
developed than the topic of binary classification. Our discussion of the
$\text{R}_{\text{K}}$ is followed by the introduction of two other metrics for
multinary classification derived from the multivariate Pearson correlation
(MPC) coefficients. We show that both $\text{R}_{\text{K}}$ and the MPC metrics
suffer from the problem of not decisively indicating poor classification
results when they should, and introduce three new enhanced metrics that do not
suffer from this problem. We also present an additional new metric for
multinary classification which can be viewed as a direct extension of MCC.
Related papers
- Robust performance metrics for imbalanced classification problems [2.07180164747172]
We show that established performance metrics in binary classification, such as the F-score, are not robust to class imbalance.
We introduce robust modifications of the F-score and the MCC for which, even in strongly imbalanced settings, the TPR is bounded away from $0$.
arXiv Detail & Related papers (2024-04-11T11:50:05Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Multi-class Classification with Fuzzy-feature Observations: Theory and
Algorithms [36.810603503167755]
We propose a novel framework to address a new realistic problem called multi-class classification with imprecise observations (MCIMO)
First, we give the theoretical analysis of the MCIMO problem based on fuzzy Rademacher complexity.
Then, two practical algorithms based on support vector machine and neural networks are constructed to solve the proposed new problem.
arXiv Detail & Related papers (2022-06-09T07:14:00Z) - Many-Class Text Classification with Matching [65.74328417321738]
We formulate textbfText textbfClassification as a textbfMatching problem between the text and the labels, and propose a simple yet effective framework named TCM.
Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels.
arXiv Detail & Related papers (2022-05-23T15:51:19Z) - Decision Making for Hierarchical Multi-label Classification with
Multidimensional Local Precision Rate [4.812468844362369]
We introduce a new statistic called the multidimensional local precision rate (mLPR) for each object in each class.
We show that classification decisions made by simply sorting objects across classes in descending order of their mLPRs can, in theory, ensure the class hierarchy.
In response, we introduce HierRank, a new algorithm that maximizes an empirical version of CATCH using estimated mLPRs while respecting the hierarchy.
arXiv Detail & Related papers (2022-05-16T17:43:35Z) - Rank4Class: A Ranking Formulation for Multiclass Classification [26.47229268790206]
Multiclass classification (MCC) is a fundamental machine learning problem.
We show that it is easy to boost MCC performance with a novel formulation through the lens of ranking.
arXiv Detail & Related papers (2021-12-17T19:22:37Z) - Margin-Based Transfer Bounds for Meta Learning with Deep Feature
Embedding [67.09827634481712]
We leverage margin theory and statistical learning theory to establish three margin-based transfer bounds for meta-learning based multiclass classification (MLMC)
These bounds reveal that the expected error of a given classification algorithm for a future task can be estimated with the average empirical error on a finite number of previous tasks.
Experiments on three benchmarks show that these margin-based models still achieve competitive performance.
arXiv Detail & Related papers (2020-12-02T23:50:51Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Coherent Hierarchical Multi-Label Classification Networks [56.41950277906307]
C-HMCNN(h) is a novel approach for HMC problems, which exploits hierarchy information in order to produce predictions coherent with the constraint and improve performance.
We conduct an extensive experimental analysis showing the superior performance of C-HMCNN(h) when compared to state-of-the-art models.
arXiv Detail & Related papers (2020-10-20T09:37:02Z) - High-Dimensional Quadratic Discriminant Analysis under Spiked Covariance
Model [101.74172837046382]
We propose a novel quadratic classification technique, the parameters of which are chosen such that the fisher-discriminant ratio is maximized.
Numerical simulations show that the proposed classifier not only outperforms the classical R-QDA for both synthetic and real data but also requires lower computational complexity.
arXiv Detail & Related papers (2020-06-25T12:00:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.