Related papers: An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results

An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results

URL: http://arxiv.org/abs/2006.01245v1
Date: Mon, 1 Jun 2020 20:35:46 GMT
Title: An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results
Authors: Enrique Amig\'o, Julio Gonzalo, Stefano Mizzaro, Jorge Carrillo-de-Albornoz
Abstract summary: We propose a new metric for Ordinal Classification, Closeness Evaluation Measure, rooted on Measurement Theory and Information Theory. Our theoretical analysis and experimental results over both synthetic data and data from NLP shared tasks indicate that the proposed metric captures quality aspects from different traditional tasks simultaneously.
Score: 9.602361044877426
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Ordinal Classification tasks, items have to be assigned to classes that have a relative ordering, such as positive, neutral, negative in sentiment analysis. Remarkably, the most popular evaluation metrics for ordinal classification tasks either ignore relevant information (for instance, precision/recall on each of the classes ignores their relative ordering) or assume additional information (for instance, Mean Average Error assumes absolute distances between classes). In this paper we propose a new metric for Ordinal Classification, Closeness Evaluation Measure, that is rooted on Measurement Theory and Information Theory. Our theoretical analysis and experimental results over both synthetic data and data from NLP shared tasks indicate that the proposed metric captures quality aspects from different traditional tasks simultaneously. In addition, it generalizes some popular classification (nominal scale) and error minimization (interval scale) metrics, depending on the measurement scale in which it is instantiated.

Related papers

Outperformance Score: A Universal Standardization Method for Confusion-Matrix-Based Classification Performance Metrics [1.5186937600119894]
We introduce the outperformance score function, a universal standardization method for confusion-matrix-based classification performance metrics.<n>The outperformance score represents the percentile rank of the observed classification performance within a reference distribution of possible performances.
arXiv Detail & Related papers (2025-05-11T16:07:14Z)
Categorical Data Clustering via Value Order Estimated Distance Metric Learning [53.28598689867732]
This paper introduces a novel order distance metric learning approach to intuitively represent categorical attribute values.<n>A new joint learning paradigm is developed to alternatively perform clustering and order distance metric learning.<n>The proposed method achieves superior clustering accuracy on categorical and mixed datasets.
arXiv Detail & Related papers (2024-11-19T08:23:25Z)
Improving the classification of extreme classes by means of loss regularisation and generalised beta distributions [8.640930010669042]
We propose a unimodal regularisation approach to improve the classification performance of the first and last classes. Performance in the extreme classes is compared using a new metric that takes into account their sensitivities. The results for the proposed metric show that the generalised beta distribution generally improves classification performance in the extreme classes.
arXiv Detail & Related papers (2024-07-17T08:57:42Z)
$F_β$-plot -- a visual tool for evaluating imbalanced data classifiers [0.0]
The paper proposes a simple approach to analyzing the popular parametric metric $F_beta$. It is possible to indicate for a given pool of analyzed classifiers when a given model should be preferred depending on user requirements.
arXiv Detail & Related papers (2024-04-11T18:07:57Z)
Discordance Minimization-based Imputation Algorithms for Missing Values in Rating Data [4.100928307172084]
When multiple rating lists are combined or considered together, subjects often have missing ratings. We propose analyses on missing value patterns using six real-world data sets in various applications. We propose optimization models and algorithms that minimize the total rating discordance across rating providers.
arXiv Detail & Related papers (2023-11-07T14:42:06Z)
Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union [113.20223082664681]
We propose the use of fine-grained mIoUs along with corresponding worst-case metrics. These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing. Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects.
arXiv Detail & Related papers (2023-10-30T03:45:15Z)
Enriching Disentanglement: From Logical Definitions to Quantitative Metrics [59.12308034729482]
Disentangling the explanatory factors in complex data is a promising approach for data-efficient representation learning. We establish relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics. We empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
arXiv Detail & Related papers (2023-05-19T08:22:23Z)
Parametric Classification for Generalized Category Discovery: A Baseline Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z)
Association Graph Learning for Multi-Task Classification with Category Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously. We learn an association graph to transfer knowledge among tasks for missing classes. Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z)
Self-Adaptive Label Augmentation for Semi-supervised Few-shot Classification [121.63992191386502]
Few-shot classification aims to learn a model that can generalize well to new tasks when only a few labeled samples are available. We propose a semi-supervised few-shot classification method that assigns an appropriate label to each unlabeled sample by a manually defined metric. A major novelty of SALA is the task-adaptive metric, which can learn the metric adaptively for different tasks in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-16T13:14:03Z)
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics [64.81682222169113]
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations. We identify two ways in which the definition of the system-level correlation is inconsistent with how metrics are used to evaluate systems in practice.
arXiv Detail & Related papers (2022-04-21T15:52:14Z)
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification. Our analysis reveals that the classification accuracy is highly distribution-dependent. The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z)
Classification Performance Metric for Imbalance Data Based on Recall and Selectivity Normalized in Class Labels [0.0]
We introduce a new performance measure based on the harmonic mean of Recall and Selectivity normalized in class labels. This paper shows that the proposed performance measure has the right properties for the imbalanced dataset.
arXiv Detail & Related papers (2020-06-23T20:38:48Z)
Classifier uncertainty: evidence, potential impact, and probabilistic treatment [0.0]
We present an approach to quantify the uncertainty of classification performance metrics based on a probability model of the confusion matrix. We show that uncertainties can be surprisingly large and limit performance evaluation.
arXiv Detail & Related papers (2020-06-19T12:49:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.