On the Limitations of Elo: Real-World Games, are Transitive, not
Additive
- URL: http://arxiv.org/abs/2206.12301v1
- Date: Tue, 21 Jun 2022 22:07:06 GMT
- Title: On the Limitations of Elo: Real-World Games, are Transitive, not
Additive
- Authors: Quentin Bertrand, Wojciech Marian Czarnecki, Gauthier Gidel
- Abstract summary: We show that Elo models can fail to extract the strength of the transitive component in games.
We propose an extension of the Elo score that assigns each player two scores, which we refer to as skill and consistency.
- Score: 13.334776744099285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world competitive games, such as chess, go, or StarCraft II, rely on Elo
models to measure the strength of their players. Since these games are not
fully transitive, using Elo implicitly assumes they have a strong transitive
component that can correctly be identified and extracted. In this study, we
investigate the challenge of identifying the strength of the transitive
component in games. First, we show that Elo models can fail to extract this
transitive component, even in elementary transitive games. Then, based on this
observation, we propose an extension of the Elo score: we end up with a disc
ranking system that assigns each player two scores, which we refer to as skill
and consistency. Finally, we propose an empirical validation on payoff matrices
coming from real-world games played by bots and humans.
Related papers
- Complete Chess Games Enable LLM Become A Chess Master [10.108949088950927]
Large language models (LLM) have shown remarkable abilities in text generation, question answering, language translation, reasoning and many other tasks.
Despite LLM's success in multiple areas, its ability to play abstract games, such as chess, is underexplored.
Here, we propose the Large language model ChessLLM to play full chess games.
arXiv Detail & Related papers (2025-01-26T09:43:39Z) - Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games [54.49589494014147]
GAMEBoT is a gaming arena designed for rigorous assessment of Large Language Models.
We benchmark 17 prominent LLMs across eight games, encompassing various strategic abilities and game characteristics.
Our results suggest that GAMEBoT presents a significant challenge, even when LLMs are provided with detailed CoT prompts.
arXiv Detail & Related papers (2024-12-18T08:32:53Z) - Imperfect-Recall Games: Equilibrium Concepts and Their Complexity [74.01381499760288]
We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before.
In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer settings.
arXiv Detail & Related papers (2024-06-23T00:27:28Z) - Ordinal Potential-based Player Rating [6.454304238638547]
We show that Elo ratings do preserve transitivity when computed in the right space.
We introduce a new game decomposition that prioritises capturing the sign pattern of the game.
We link our approach to the known concept of sign-rank, and evaluate our methodology using both toy examples and empirical data from real-world games.
arXiv Detail & Related papers (2023-06-08T17:08:52Z) - Are AlphaZero-like Agents Robust to Adversarial Perturbations? [73.13944217915089]
AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin.
We ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions.
We develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space.
arXiv Detail & Related papers (2022-11-07T18:43:25Z) - Emergent Communication: Generalization and Overfitting in Lewis Games [53.35045559317384]
Lewis signaling games are a class of simple communication games for simulating the emergence of language.
In these games, two agents must agree on a communication protocol in order to solve a cooperative task.
Previous work has shown that agents trained to play this game with reinforcement learning tend to develop languages that display undesirable properties.
arXiv Detail & Related papers (2022-09-30T09:50:46Z) - GCN-WP -- Semi-Supervised Graph Convolutional Networks for Win
Prediction in Esports [84.55775845090542]
We propose a semi-supervised win prediction model for esports based on graph convolutional networks.
GCN-WP integrates over 30 features about the match and players and employs graph convolution to classify games based on their neighborhood.
Our model achieves state-of-the-art prediction accuracy when compared to machine learning or skill rating models for LoL.
arXiv Detail & Related papers (2022-07-26T21:38:07Z) - Collusion Detection in Team-Based Multiplayer Games [57.153233321515984]
We propose a system that detects colluding behaviors in team-based multiplayer games.
The proposed method analyzes the players' social relationships paired with their in-game behavioral patterns.
We then automate the detection using Isolation Forest, an unsupervised learning technique specialized in highlighting outliers.
arXiv Detail & Related papers (2022-03-10T02:37:39Z) - Learning to Identify Top Elo Ratings: A Dueling Bandits Approach [27.495132915328025]
We propose an efficient online match scheduling algorithm to improve the sample efficiency of the Elo evaluation (for top players)
Specifically, we identify and match the top players through a dueling bandits framework and tailor the bandit algorithm to the gradient-based update of Elo.
Our algorithm has a regret guarantee $tildeO(sqrtT)$, sublinear in the number of competition rounds and has been extended to the multidimensional Elo ratings.
arXiv Detail & Related papers (2022-01-12T13:57:29Z) - Elo Ratings for Large Tournaments of Software Agents in Asymmetric Games [0.0]
It is natural to evaluate artificial intelligence agents on the same Elo scale as humans, such as the rating of 5185 attributed to AlphaGo Zero.
There are several fundamental differences between humans and AI that suggest modifications to the system.
We present a revised rating system, and guidelines for tournaments, to reflect these differences.
arXiv Detail & Related papers (2021-04-23T21:49:20Z) - ELO System for Skat and Other Games of Chance [1.3706331473063877]
The evaluation of player strength in trick-taking card games like Skat or Bridge is not obvious.
We propose a new ELO system for Skat to overcome these weaknesses.
arXiv Detail & Related papers (2021-04-07T08:30:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.