A Comparative Evaluation of Quantification Methods
- URL: http://arxiv.org/abs/2103.03223v3
- Date: Wed, 18 Oct 2023 14:10:17 GMT
- Title: A Comparative Evaluation of Quantification Methods
- Authors: Tobias Schumacher, Markus Strohmaier, Florian Lemmerich
- Abstract summary: Quantification represents the problem of predicting class distributions in a dataset.
A large variety of different algorithms has been proposed in recent years.
We compare 24 different methods on overall more than 40 data sets.
- Score: 3.1499058381005227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantification represents the problem of predicting class distributions in a
dataset. It also represents a growing research field in supervised machine
learning, for which a large variety of different algorithms has been proposed
in recent years. However, a comprehensive empirical comparison of
quantification methods that supports algorithm selection is not available yet.
In this work, we close this research gap by conducting a thorough empirical
performance comparison of 24 different quantification methods on overall more
than 40 data sets, considering binary as well as multiclass quantification
settings. We observe that no single algorithm generally outperforms all
competitors, but identify a group of methods including the threshold
selection-based Median Sweep and TSMax methods, the DyS framework, and
Friedman's method that performs best in the binary setting. For the multiclass
setting, we observe that a different group of algorithms yields good
performance, including the Generalized Probabilistic Adjusted Count, the readme
method, the energy distance minimization method, the EM algorithm for
quantification, and Friedman's method. We also find that tuning the underlying
classifiers has in most cases only a limited impact on the quantification
performance. More generally, we find that the performance on multiclass
quantification is inferior to the results obtained in the binary setting. Our
results can guide practitioners who intend to apply quantification algorithms
and help researchers to identify opportunities for future research.
Related papers
- A General Online Algorithm for Optimizing Complex Performance Metrics [5.726378955570775]
We introduce and analyze a general online algorithm that can be used in a straightforward way with a variety of complex performance metrics in binary, multi-class, and multi-label classification problems.
The algorithm's update and prediction rules are appealingly simple and computationally efficient without the need to store any past data.
arXiv Detail & Related papers (2024-06-20T21:24:47Z) - Regularization-Based Methods for Ordinal Quantification [49.606912965922504]
We study the ordinal case, i.e., the case in which a total order is defined on the set of n>2 classes.
We propose a novel class of regularized OQ algorithms, which outperforms existing algorithms in our experiments.
arXiv Detail & Related papers (2023-10-13T16:04:06Z) - Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms [88.93372675846123]
We propose a task-agnostic evaluation framework Camilla for evaluating machine learning algorithms.
We use cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills of each sample.
In our experiments, Camilla outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
arXiv Detail & Related papers (2023-07-14T03:15:56Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Hybrid Ensemble optimized algorithm based on Genetic Programming for
imbalanced data classification [0.0]
We propose a hybrid ensemble algorithm based on Genetic Programming (GP) for two classes of imbalanced data classification.
Experimental results show the performance of the proposed method on the specified data sets in the size of the training set shows 40% and 50% better accuracy than other dimensions of the minority class prediction.
arXiv Detail & Related papers (2021-06-02T14:14:38Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods.
We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms.
We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Probabilistic Diagnostic Tests for Degradation Problems in Supervised
Learning [0.0]
Problems such as class imbalance, overlapping, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms.
Probability diagnostic model based on identifying signs and symptoms of each problem is presented.
Behavior and performance of several supervised algorithms are studied when training sets have such problems.
arXiv Detail & Related papers (2020-04-06T20:32:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.