Improved theoretical guarantee for rank aggregation via spectral method
- URL: http://arxiv.org/abs/2309.03808v2
- Date: Sun, 10 Sep 2023 05:25:26 GMT
- Title: Improved theoretical guarantee for rank aggregation via spectral method
- Authors: Ziliang Samuel Zhong, Shuyang Ling
- Abstract summary: Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations?
This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications.
Here, each pairwise comparison is a corrupted copy of the true score difference.
- Score: 1.0152838128195467
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given pairwise comparisons between multiple items, how to rank them so that
the ranking matches the observations? This problem, known as rank aggregation,
has found many applications in sports, recommendation systems, and other web
applications. As it is generally NP-hard to find a global ranking that
minimizes the mismatch (known as the Kemeny optimization), we focus on the
Erd\"os-R\'enyi outliers (ERO) model for this ranking problem. Here, each
pairwise comparison is a corrupted copy of the true score difference. We
investigate spectral ranking algorithms that are based on unnormalized and
normalized data matrices. The key is to understand their performance in
recovering the underlying scores of each item from the observed data. This
reduces to deriving an entry-wise perturbation error bound between the top
eigenvectors of the unnormalized/normalized data matrix and its population
counterpart. By using the leave-one-out technique, we provide a sharper
$\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an
error bound on the maximum displacement for each item, with only $\Omega(n\log
n)$ samples. Our theoretical analysis improves upon the state-of-the-art
results in terms of sample complexity, and our numerical experiments confirm
these theoretical findings.
Related papers
- Entrywise error bounds for low-rank approximations of kernel matrices [55.524284152242096]
We derive entrywise error bounds for low-rank approximations of kernel matrices obtained using the truncated eigen-decomposition.
A key technical innovation is a delocalisation result for the eigenvectors of the kernel matrix corresponding to small eigenvalues.
We validate our theory with an empirical study of a collection of synthetic and real-world datasets.
arXiv Detail & Related papers (2024-05-23T12:26:25Z) - HITSnDIFFs: From Truth Discovery to Ability Discovery by Recovering
Matrices with the Consecutive Ones Property [11.332742187228524]
We analyze a general problem in a crowd-sourced setting where one user asks a question (also called item) and other users return answers (also called labels) for this question.
We call this problem "ability discovery" to emphasize the connection to and duality with the more well-studied problem of "truth discovery"
Our experiments show that our novel variant of HITS produces user rankings with robustly high accuracy compared to state-of-the-art truth discovery methods.
arXiv Detail & Related papers (2023-12-21T18:47:17Z) - Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models [63.714662435555674]
Large language models (LLMs) exhibit positional bias in how they use context.
We propose permutation self-consistency, a form of self-consistency over ranking list outputs of black-box LLMs.
Our approach improves scores from conventional inference by up to 7-18% for GPT-3.5 and 8-16% for LLaMA v2 (70B)
arXiv Detail & Related papers (2023-10-11T17:59:02Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Lagrangian Inference for Ranking Problems [18.70913621061314]
We consider the widely adopted Bradley-Terry-Luce (BTL) model, where each item is assigned a positive preference score that determines the Bernoulli distributions of pairwise comparisons' outcomes.
Our proposed method aims to infer general ranking properties of the BTL model.
We generalize our framework to multiple testing problems where we control the false discovery rate (FDR) and apply the method to infer the top-$K$ ranked items.
arXiv Detail & Related papers (2021-10-01T01:16:25Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Optimal Full Ranking from Pairwise Comparisons [16.852801934478475]
The minimax rate of ranking exhibits a transition between an exponential rate and a rate depending on the magnitude of the signal-to-noise ratio of the problem.
To achieve the minimax rate, we propose a divide-and-conquer ranking algorithm that first divides the $n$ players into groups of similar skills and then computes local MLE within each group.
arXiv Detail & Related papers (2021-01-21T03:34:44Z) - PiRank: Learning To Rank via Differentiable Sorting [85.28916333414145]
We propose PiRank, a new class of differentiable surrogates for ranking.
We show that PiRank exactly recovers the desired metrics in the limit of zero temperature.
arXiv Detail & Related papers (2020-12-12T05:07:36Z) - Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion [17.615174152419133]
It is not known which revealed entries are corrupted.
We show that our proposed algorithm is optimal when the set of revealed entries is given by an ErdHos-R'enyi random graph.
In particular, we show that our proposed algorithm is optimal when the set of revealed entries is given by an ErdHos-R'enyi random graph.
arXiv Detail & Related papers (2020-10-23T06:23:20Z) - Partial Recovery for Top-$k$ Ranking: Optimality of MLE and
Sub-Optimality of Spectral Method [20.81132428224778]
We characterizes the partial recovery error in terms of the proportion of mistakes for top-$k$ ranking.
We also derive the optimal signal to noise ratio condition for the exact recovery of the top-$k$ set.
arXiv Detail & Related papers (2020-06-30T02:32:42Z) - Towards Discriminability and Diversity: Batch Nuclear-norm Maximization
under Label Insufficient Situations [154.51144248210338]
Batch Nuclear-norm Maximization (BNM) is proposed to boost the learning under label insufficient learning scenarios.
BNM outperforms competitors and works well with existing well-known methods.
arXiv Detail & Related papers (2020-03-27T05:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.