Learning to Identify Top Elo Ratings: A Dueling Bandits Approach
- URL: http://arxiv.org/abs/2201.04480v1
- Date: Wed, 12 Jan 2022 13:57:29 GMT
- Title: Learning to Identify Top Elo Ratings: A Dueling Bandits Approach
- Authors: Xue Yan, Yali Du, Binxin Ru, Jun Wang, Haifeng Zhang, Xu Chen
- Abstract summary: We propose an efficient online match scheduling algorithm to improve the sample efficiency of the Elo evaluation (for top players)
Specifically, we identify and match the top players through a dueling bandits framework and tailor the bandit algorithm to the gradient-based update of Elo.
Our algorithm has a regret guarantee $tildeO(sqrtT)$, sublinear in the number of competition rounds and has been extended to the multidimensional Elo ratings.
- Score: 27.495132915328025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Elo rating system is widely adopted to evaluate the skills of (chess)
game and sports players. Recently it has been also integrated into machine
learning algorithms in evaluating the performance of computerised AI agents.
However, an accurate estimation of the Elo rating (for the top players) often
requires many rounds of competitions, which can be expensive to carry out. In
this paper, to improve the sample efficiency of the Elo evaluation (for top
players), we propose an efficient online match scheduling algorithm.
Specifically, we identify and match the top players through a dueling bandits
framework and tailor the bandit algorithm to the gradient-based update of Elo.
We show that it reduces the per-step memory and time complexity to constant,
compared to the traditional likelihood maximization approaches requiring $O(t)$
time. Our algorithm has a regret guarantee of $\tilde{O}(\sqrt{T})$, sublinear
in the number of competition rounds and has been extended to the
multidimensional Elo ratings for handling intransitive games. We empirically
demonstrate that our method achieves superior convergence speed and time
efficiency on a variety of gaming tasks.
Related papers
- Policy Optimization for Markov Games: Unified Framework and Faster
Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game.
We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Evaluating Team Skill Aggregation in Online Competitive Games [4.168733556014873]
We present an analysis of the impact of two new aggregation methods on the predictive performance of rating systems.
Our evaluations show the superiority of the MAX method over the other two methods in the majority of the tested cases.
Results of this study highlight the necessity of devising more elaborated methods for calculating a team's performance.
arXiv Detail & Related papers (2021-06-21T20:17:36Z) - Simplified Kalman filter for online rating: one-fits-all approach [4.010371060637208]
We deal with the problem of rating in sports, where the skills of the players/teams are inferred from the observed outcomes of the games.
Our focus is on the online rating algorithms which estimate the skills after each new game by exploiting the probabilistic models of the relationship between the skills and the game outcome.
arXiv Detail & Related papers (2021-04-28T20:44:10Z) - Elo Ratings for Large Tournaments of Software Agents in Asymmetric Games [0.0]
It is natural to evaluate artificial intelligence agents on the same Elo scale as humans, such as the rating of 5185 attributed to AlphaGo Zero.
There are several fundamental differences between humans and AI that suggest modifications to the system.
We present a revised rating system, and guidelines for tournaments, to reflect these differences.
arXiv Detail & Related papers (2021-04-23T21:49:20Z) - ELO System for Skat and Other Games of Chance [1.3706331473063877]
The evaluation of player strength in trick-taking card games like Skat or Bridge is not obvious.
We propose a new ELO system for Skat to overcome these weaknesses.
arXiv Detail & Related papers (2021-04-07T08:30:01Z) - An Elo-like System for Massive Multiplayer Competitions [1.8782750537161612]
We present a novel Bayesian rating system for contests with many participants.
It is widely applicable to competition formats with discrete ranked matches.
We show that the system aligns incentives: that is, a player who seeks to maximize their rating will never want to underperform.
arXiv Detail & Related papers (2021-01-02T08:14:31Z) - Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies
in Extensive-Form Zero-Sum Games [123.76716667704625]
We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game.
In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game.
We provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile.
arXiv Detail & Related papers (2020-09-21T17:51:57Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.