Partial Recovery for Top-$k$ Ranking: Optimality of MLE and
Sub-Optimality of Spectral Method
- URL: http://arxiv.org/abs/2006.16485v2
- Date: Thu, 15 Jul 2021 06:47:53 GMT
- Title: Partial Recovery for Top-$k$ Ranking: Optimality of MLE and
Sub-Optimality of Spectral Method
- Authors: Pinhan Chen, Chao Gao, Anderson Y. Zhang
- Abstract summary: We characterizes the partial recovery error in terms of the proportion of mistakes for top-$k$ ranking.
We also derive the optimal signal to noise ratio condition for the exact recovery of the top-$k$ set.
- Score: 20.81132428224778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given partially observed pairwise comparison data generated by the
Bradley-Terry-Luce (BTL) model, we study the problem of top-$k$ ranking. That
is, to optimally identify the set of top-$k$ players. We derive the minimax
rate with respect to a normalized Hamming loss. This provides the first result
in the literature that characterizes the partial recovery error in terms of the
proportion of mistakes for top-$k$ ranking. We also derive the optimal signal
to noise ratio condition for the exact recovery of the top-$k$ set. The maximum
likelihood estimator (MLE) is shown to achieve both optimal partial recovery
and optimal exact recovery. On the other hand, we show another popular
algorithm, the spectral method, is in general sub-optimal. Our results
complement the recent work by Chen et al. (2019) that shows both the MLE and
the spectral method achieve the optimal sample complexity for exact recovery.
It turns out the leading constants of the sample complexity are different for
the two algorithms. Another contribution that may be of independent interest is
the analysis of the MLE without any penalty or regularization for the BTL
model. This closes an important gap between theory and practice in the
literature of ranking.
Related papers
- Top-$K$ ranking with a monotone adversary [19.871049853222132]
We consider the scenario where a comparison graph is randomly generated and the adversary is allowed to add arbitrary edges.
The statistician's goal is then to accurately identify the top-$K$ preferred items based on pairwise comparisons.
The main contribution of this paper is to develop a weighted maximum likelihood estimator (MLE) that achieves near-optimal sample complexity.
arXiv Detail & Related papers (2024-02-12T06:57:34Z) - Faster Convergence with Multiway Preferences [99.68922143784306]
We consider the sign-function-based comparison feedback model and analyze the convergence rates with batched and multiway comparisons.
Our work is the first to study the problem of convex optimization with multiway preferences and analyze the optimal convergence rates.
arXiv Detail & Related papers (2023-12-19T01:52:13Z) - Maximum Likelihood Estimation is All You Need for Well-Specified
Covariate Shift [34.414261291690856]
Key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization.
We show that classical Maximum Likelihood Estimation (MLE) purely using source data achieves the minimax optimality.
We illustrate the wide applicability of our framework by instantiating it to three concrete examples.
arXiv Detail & Related papers (2023-11-27T16:06:48Z) - Improved theoretical guarantee for rank aggregation via spectral method [1.0152838128195467]
Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations?
This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications.
Here, each pairwise comparison is a corrupted copy of the true score difference.
arXiv Detail & Related papers (2023-09-07T16:01:47Z) - Optimal and Private Learning from Human Response Data [13.869502085838452]
Recently, Nguyen and Zhang proposed a new spectral estimation algorithm that is efficient and accurate.
We extend their results in two important ways.
We develop a private extension of the spectral algorithm, leveraging its unique Markov chain formulation and the discrete Gaussian mechanism.
arXiv Detail & Related papers (2023-03-10T22:52:12Z) - Principled Reinforcement Learning with Human Feedback from Pairwise or
$K$-wise Comparisons [79.98542868281473]
We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF)
We show that when training a policy based on the learned reward model, MLE fails while a pessimistic MLE provides policies with improved performance under certain coverage assumptions.
arXiv Detail & Related papers (2023-01-26T18:07:21Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.