Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning
- URL: http://arxiv.org/abs/2602.10640v1
- Date: Wed, 11 Feb 2026 08:41:14 GMT
- Title: Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning
- Authors: Stephan Clémençon, Ekhine Irurozki,
- Abstract summary: We introduce the concept of textitconsensus ranking distribution ($crd$), a sparse mixture model of Dirac masses on $mathfrakS_n$.<n>We prove that by choosing the popular Kendall $$ distance as the cost function, the optimal distortion can be expressed as a function of pairwise probabilities.<n>We propose a top-down tree-structured statistical algorithm that allows for the progressive refinement of a CRD based on ranking data.
- Score: 2.745827783449186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this article we develop a new method for summarizing a ranking distribution, \textit{i.e.} a probability distribution on the symmetric group $\mathfrak{S}_n$, beyond the classical theory of consensus and Kemeny medians. Based on the notion of \textit{local ranking median}, we introduce the concept of \textit{consensus ranking distribution} ($\crd$), a sparse mixture model of Dirac masses on $\mathfrak{S}_n$, in order to approximate a ranking distribution with small distortion from a mass transportation perspective. We prove that by choosing the popular Kendall $τ$ distance as the cost function, the optimal distortion can be expressed as a function of pairwise probabilities, paving the way for the development of efficient learning methods that do not suffer from the lack of vector space structure on $\mathfrak{S}_n$. In particular, we propose a top-down tree-structured statistical algorithm that allows for the progressive refinement of a CRD based on ranking data, from the Dirac mass at a Kemeny median at the root of the tree to the empirical ranking data distribution itself at the end of the tree's exhaustive growth. In addition to the theoretical arguments developed, the relevance of the algorithm is empirically supported by various numerical experiments.
Related papers
- Robust Barycenter Estimation using Semi-Unbalanced Neural Optimal Transport [84.51977664336056]
We propose a novel scalable approach for estimating the robust continuous barycenter.<n>Our method is framed as a min-max optimization problem and is adaptable to general cost functions.
arXiv Detail & Related papers (2024-10-04T23:27:33Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - On Ranking-based Tests of Independence [0.0]
We develop a novel nonparametric framework to test the independence of two random variables $mathbfX$ and $mathbfY$.
We consider a wide class of rank statistics encompassing many ways of deviating from the diagonal in the ROC space to build tests of independence.
arXiv Detail & Related papers (2024-03-12T10:00:00Z) - Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - Classification Tree Pruning Under Covariate Shift [7.982668978293684]
We consider the problem of emphpruning a classification tree, that is, selecting a suitable subtree that balances bias and variance.
We present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate.
arXiv Detail & Related papers (2023-05-07T17:08:21Z) - Estimating the Density Ratio between Distributions with High Discrepancy
using Multinomial Logistic Regression [21.758330613138778]
We show that the state-of-the-art density ratio estimators perform poorly on well-separated cases.
We present an alternative method that leverages multi-class classification for density ratio estimation.
arXiv Detail & Related papers (2023-05-01T15:10:56Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Statistical Depth Functions for Ranking Distributions: Definitions,
Statistical Learning and Applications [3.7564482287844205]
The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data.
It is the purpose of this paper to define analogs of quantiles, ranks and statistical procedures based on such quantities.
arXiv Detail & Related papers (2022-01-20T10:30:56Z) - Probability Distribution on Full Rooted Trees [2.1506382989223782]
In data compression, image processing, and machine learning, the full rooted tree is not a random variable.
One method to solve it is to assume a prior distribution on the full rooted trees.
In this paper, we propose a probability distribution on a set of full rooted trees.
arXiv Detail & Related papers (2021-09-27T06:51:35Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - Distributional Reinforcement Learning with Unconstrained Monotonic
Neural Networks [7.907645828535088]
The paper introduces a methodology for learning different representations of the random return distribution.
A novel distributional RL algorithm named unconstrained monotonic deep Q-network (UMDQN) is presented.
arXiv Detail & Related papers (2021-06-06T20:03:50Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z) - Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples.
A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set.
We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.