Related papers: A Bayesian Bradley-Terry model to compare multiple ML algorithms on multiple data sets

A Bayesian Bradley-Terry model to compare multiple ML algorithms on multiple data sets

URL: http://arxiv.org/abs/2208.04935v2
Date: Sat, 15 Jul 2023 10:28:33 GMT
Title: A Bayesian Bradley-Terry model to compare multiple ML algorithms on multiple data sets
Authors: Jacques Wainer
Abstract summary: This paper proposes a Bayesian model to compare multiple algorithms on multiple data sets, on any metric. The model is based on the Bradley-Terry model, that counts the number of times one algorithm performs better than another on different data sets.
Score: 4.394728504061753
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper proposes a Bayesian model to compare multiple algorithms on multiple data sets, on any metric. The model is based on the Bradley-Terry model, that counts the number of times one algorithm performs better than another on different data sets. Because of its Bayesian foundations, the Bayesian Bradley Terry model (BBT) has different characteristics than frequentist approaches to comparing multiple algorithms on multiple data sets, such as Demsar (2006) tests on mean rank, and Benavoli et al. (2016) multiple pairwise Wilcoxon tests with p-adjustment procedures. In particular, a Bayesian approach allows for more nuanced statements regarding the algorithms beyond claiming that the difference is or it is not statistically significant. Bayesian approaches also allow to define when two algorithms are equivalent for practical purposes, or the region of practical equivalence (ROPE). Different than a Bayesian signed rank comparison procedure proposed by Benavoli et al. (2017), our approach can define a ROPE for any metric, since it is based on probability statements, and not on differences of that metric. This paper also proposes a local ROPE concept, that evaluates whether a positive difference between a mean measure across some cross validation to the mean of some other algorithms is should be really seen as the first algorithm being better than the second, based on effect sizes. This local ROPE proposal is independent of a Bayesian use, and can be used in frequentist approaches based on ranks. A R package and a Python program that implements the BBT is available.

Related papers

Scalable Private Partition Selection via Adaptive Weighting [66.09199304818928]
In a private set union, users hold subsets of items from an unbounded universe. The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy. We propose an algorithm for this problem, MaximumDegree (MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight.
arXiv Detail & Related papers (2025-02-13T01:27:11Z)
HARRIS: Hybrid Ranking and Regression Forests for Algorithm Selection [75.84584400866254]
We propose a new algorithm selector leveraging special forests, combining the strengths of both approaches while alleviating their weaknesses. HARRIS' decisions are based on a forest model, whose trees are created based on optimized on a hybrid ranking and regression loss function.
arXiv Detail & Related papers (2022-10-31T14:06:11Z)
Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences. Exact methods yield better classification performance, but they pose high computational costs. We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
arXiv Detail & Related papers (2022-09-11T22:44:19Z)
Efficient computation of rankings from pairwise comparisons [0.0]
We describe an alternative and similarly simple iteration that provably returns identical results but does so much faster. We demonstrate this algorithm with applications to a range of example data sets and derive a number of results regarding its convergence.
arXiv Detail & Related papers (2022-06-30T19:39:09Z)
Ranking with Confidence for Large Scale Comparison Data [1.2183405753834562]
In this work, we leverage a generative data model considering comparison noise to develop a fast, precise, and informative ranking from pairwise comparisons. In real data, PD-Rank requires less computational time to achieve the same Kendall algorithm than active learning methods.
arXiv Detail & Related papers (2022-02-03T16:36:37Z)
Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information [78.78486761923855]
In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations. We present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output. On these problems, InfoBAX uses up to 500 times fewer queries to f than required by the original algorithm.
arXiv Detail & Related papers (2021-04-19T17:22:11Z)
The FMRIB Variational Bayesian Inference Tutorial II: Stochastic Variational Bayes [1.827510863075184]
This tutorial revisits the original FMRIB Variational Bayes tutorial. This new approach bears a lot of similarity to, and has benefited from, computational methods applied to machine learning algorithms.
arXiv Detail & Related papers (2020-07-03T11:31:52Z)
Approximating a Target Distribution using Weight Queries [25.392248158616862]
We propose an interactive algorithm that iteratively selects data set examples and performs corresponding weight queries. We derive an approximation bound on the total variation distance between the reweighting found by the algorithm and the best achievable reweighting.
arXiv Detail & Related papers (2020-06-24T11:17:43Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
Active Sampling for Pairwise Comparisons via Approximate Message Passing and Information Gain Maximization [5.771869590520189]
We propose ASAP, an active sampling algorithm based on approximate message passing and expected information gain. We show that ASAP offers the highest accuracy of inferred scores compared to the existing methods.
arXiv Detail & Related papers (2020-04-12T20:48:10Z)
LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs Set Similarity Under Skew [58.21885402826496]
All-pairs set similarity is a widely used data mining task, even for large and high-dimensional datasets. We present a new distributed algorithm, LSF-Join, for approximate all-pairs set similarity. We show that LSF-Join efficiently finds most close pairs, even for small similarity thresholds and for skewed input sets.
arXiv Detail & Related papers (2020-03-06T00:06:20Z)
Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.