New Bounds on the Accuracy of Majority Voting for Multi-Class
Classification
- URL: http://arxiv.org/abs/2309.09564v1
- Date: Mon, 18 Sep 2023 08:16:41 GMT
- Title: New Bounds on the Accuracy of Majority Voting for Multi-Class
Classification
- Authors: Sina Aeeneh, Nikola Zlatanov, Jiangshan Yu
- Abstract summary: The accuracy of the MVF for the general multi-class classification problem has remained unknown.
We show that under certain conditions, the error rate of the MVF exponentially decays toward zero as the number of independent voters increases.
We then provide a discussion on the accuracy of the truth discovery algorithms.
- Score: 5.95012663623095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Majority voting is a simple mathematical function that returns the value that
appears most often in a set. As a popular decision fusion technique, the
majority voting function (MVF) finds applications in resolving conflicts, where
a number of independent voters report their opinions on a classification
problem. Despite its importance and its various applications in ensemble
learning, data crowd-sourcing, remote sensing, and data oracles for
blockchains, the accuracy of the MVF for the general multi-class classification
problem has remained unknown. In this paper, we derive a new upper bound on the
accuracy of the MVF for the multi-class classification problem. More
specifically, we show that under certain conditions, the error rate of the MVF
exponentially decays toward zero as the number of independent voters increases.
Conversely, the error rate of the MVF exponentially grows with the number of
independent voters if these conditions are not met.
We first explore the problem for independent and identically distributed
voters where we assume that every voter follows the same conditional
probability distribution of voting for different classes, given the true
classification of the data point. Next, we extend our results for the case
where the voters are independent but non-identically distributed. Using the
derived results, we then provide a discussion on the accuracy of the truth
discovery algorithms. We show that in the best-case scenarios, truth discovery
algorithms operate as an amplified MVF and thereby achieve a small error rate
only when the MVF achieves a small error rate, and vice versa, achieve a large
error rate when the MVF also achieves a large error rate. In the worst-case
scenario, the truth discovery algorithms may achieve a higher error rate than
the MVF. Finally, we confirm our theoretical results using numerical
simulations.
Related papers
- NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models [70.02816541347251]
This paper presents a lightweight method, Norm Voting (NoVo), which harnesses the untapped potential of attention head norms to enhance factual accuracy.
On TruthfulQA MC1, NoVo surpasses the current state-of-the-art and all previous methods by an astounding margin -- at least 19 accuracy points.
arXiv Detail & Related papers (2024-10-11T16:40:03Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Anomaly Detection Under Uncertainty Using Distributionally Robust
Optimization Approach [0.9217021281095907]
Anomaly detection is defined as the problem of finding data points that do not follow the patterns of the majority.
The one-class Support Vector Machines (SVM) method aims to find a decision boundary to distinguish between normal data points and anomalies.
A distributionally robust chance-constrained model is proposed in which the probability of misclassification is low.
arXiv Detail & Related papers (2023-12-03T06:13:22Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Data as voters: instance selection using approval-based multi-winner voting [1.597617022056624]
We present a novel approach to the instance selection problem in machine learning (or data mining)
In our model, instances play a double role as voters and candidates.
For SVMs, we have obtained slight increases in the average accuracy by using several voting rules that satisfy EJR or PJR.
arXiv Detail & Related papers (2023-04-19T22:00:23Z) - Fairly Allocating Utility in Constrained Multiwinner Elections [0.0]
A common denominator to ensure fairness across all such contexts is the use of constraints.
Across these contexts, the candidates selected to satisfy the given constraints may systematically lead to unfair outcomes for historically disadvantaged voter populations.
We develop a model to select candidates that satisfy the constraints fairly across voter populations.
arXiv Detail & Related papers (2022-11-23T10:04:26Z) - Multi-Label Quantification [78.83284164605473]
Quantification, variously called "labelled prevalence estimation" or "learning to quantify", is the supervised learning task of generating predictors of the relative frequencies of the classes of interest in unsupervised data samples.
We propose methods for inferring estimators of class prevalence values that strive to leverage the dependencies among the classes of interest in order to predict their relative frequencies more accurately.
arXiv Detail & Related papers (2022-11-15T11:29:59Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Learning Stochastic Majority Votes by Minimizing a PAC-Bayes
Generalization Bound [15.557653926558638]
We investigate a counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties.
We instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk.
The resulting majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight bounds.
arXiv Detail & Related papers (2021-06-23T16:57:23Z) - A Unified Joint Maximum Mean Discrepancy for Domain Adaptation [73.44809425486767]
This paper theoretically derives a unified form of JMMD that is easy to optimize.
From the revealed unified JMMD, we illustrate that JMMD degrades the feature-label dependence that benefits to classification.
We propose a novel MMD matrix to promote the dependence, and devise a novel label kernel that is robust to label distribution shift.
arXiv Detail & Related papers (2021-01-25T09:46:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.