Statistical Comparisons of Classifiers by Generalized Stochastic
Dominance
- URL: http://arxiv.org/abs/2209.01857v2
- Date: Wed, 5 Jul 2023 13:56:24 GMT
- Title: Statistical Comparisons of Classifiers by Generalized Stochastic
Dominance
- Authors: Christoph Jansen (1), Malte Nalenz (1), Georg Schollmeyer (1), Thomas
Augustin (1) ((1) Ludwig-Maximilians-Universit\"at Munich)
- Abstract summary: There is still no consensus on how to compare classifiers over multiple data sets with respect to several criteria.
In this paper, we add a fresh view to the vivid debate by adopting recent developments in decision theory.
We show that our framework ranks classifiers by a generalized concept of dominance, which powerfully circumvents the cumbersome, and often even self-contradictory, reliance on aggregates.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although being a crucial question for the development of machine learning
algorithms, there is still no consensus on how to compare classifiers over
multiple data sets with respect to several criteria. Every comparison framework
is confronted with (at least) three fundamental challenges: the multiplicity of
quality criteria, the multiplicity of data sets and the randomness of the
selection of data sets. In this paper, we add a fresh view to the vivid debate
by adopting recent developments in decision theory. Based on so-called
preference systems, our framework ranks classifiers by a generalized concept of
stochastic dominance, which powerfully circumvents the cumbersome, and often
even self-contradictory, reliance on aggregates. Moreover, we show that
generalized stochastic dominance can be operationalized by solving
easy-to-handle linear programs and moreover statistically tested employing an
adapted two-sample observation-randomization test. This yields indeed a
powerful framework for the statistical comparison of classifiers over multiple
data sets with respect to multiple quality criteria simultaneously. We
illustrate and investigate our framework in a simulation study and with a set
of standard benchmark data sets.
Related papers
- Ensemble Methods for Sequence Classification with Hidden Markov Models [8.241486511994202]
We present a lightweight approach to sequence classification using Ensemble Methods for Hidden Markov Models (HMMs)
HMMs offer significant advantages in scenarios with imbalanced or smaller datasets due to their simplicity, interpretability, and efficiency.
Our ensemble-based scoring method enables the comparison of sequences of any length and improves performance on imbalanced datasets.
arXiv Detail & Related papers (2024-09-11T20:59:32Z) - Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking [21.23500484100963]
We introduce a statistic that assesses almost dominance under the framework of Optimal Transport with a smooth cost.
We also propose a hypothesis testing framework as well as an efficient implementation using the Sinkhorn algorithm.
We showcase our method in comparing and benchmarking Large Language Models that are evaluated on multiple metrics.
arXiv Detail & Related papers (2024-06-10T16:14:50Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Beyond Adult and COMPAS: Fairness in Multi-Class Prediction [8.405162568925405]
We formulate this problem in terms of "projecting" a pre-trained (and potentially unfair) classifier onto the set of models that satisfy target group-fairness requirements.
We provide a parallelizable iterative algorithm for computing the projected classifier and derive both sample complexity and convergence guarantees.
We also evaluate our method at scale on an open dataset with multiple classes, multiple intersectional protected groups, and over 1M samples.
arXiv Detail & Related papers (2022-06-15T20:29:33Z) - Probability-driven scoring functions in combining linear classifiers [0.913755431537592]
This research is aimed at building a new fusion method dedicated to the ensemble of linear classifiers.
The proposed fusion method is compared with the reference method using multiple benchmark datasets taken from the KEEL repository.
The experimental study shows that, under certain conditions, some improvement may be obtained.
arXiv Detail & Related papers (2021-09-16T08:58:32Z) - Preference learning along multiple criteria: A game-theoretic
perspective [97.94912276610002]
We generalize the notion of a von Neumann winner to the multi-criteria setting by taking inspiration from Blackwell's approachability.
Our framework allows for non-linear aggregation of preferences across criteria, and generalizes the linearization-based approach from multi-objective optimization.
We show that the Blackwell winner of a multi-criteria problem instance can be computed as the solution to a convex optimization problem.
arXiv Detail & Related papers (2021-05-05T03:23:11Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Random Hyperboxes [9.061408029414455]
We show a generalization error bound of the proposed classifier based on the strength of the individual hyperbox-based classifiers.
The effectiveness of the proposed classifier is analyzed using a carefully selected illustrative example.
We identify the existing issues related to the generalization error bounds of the real datasets and inform the potential research directions.
arXiv Detail & Related papers (2020-06-01T03:42:20Z) - Group Heterogeneity Assessment for Multilevel Models [68.95633278540274]
Many data sets contain an inherent multilevel structure.
Taking this structure into account is critical for the accuracy and calibration of any statistical analysis performed on such data.
We propose a flexible framework for efficiently assessing differences between the levels of given grouping variables in the data.
arXiv Detail & Related papers (2020-05-06T12:42:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.