Related papers: An Elo-like System for Massive Multiplayer Competitions

An Elo-like System for Massive Multiplayer Competitions

URL: http://arxiv.org/abs/2101.00400v1
Date: Sat, 2 Jan 2021 08:14:31 GMT
Title: An Elo-like System for Massive Multiplayer Competitions
Authors: Aram Ebtekar and Paul Liu
Abstract summary: We present a novel Bayesian rating system for contests with many participants. It is widely applicable to competition formats with discrete ranked matches. We show that the system aligns incentives: that is, a player who seeks to maximize their rating will never want to underperform.
Score: 1.8782750537161612
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rating systems play an important role in competitive sports and games. They provide a measure of player skill, which incentivizes competitive performances and enables balanced match-ups. In this paper, we present a novel Bayesian rating system for contests with many participants. It is widely applicable to competition formats with discrete ranked matches, such as online programming competitions, obstacle courses races, and some video games. The simplicity of our system allows us to prove theoretical bounds on robustness and runtime. In addition, we show that the system aligns incentives: that is, a player who seeks to maximize their rating will never want to underperform. Experimentally, the rating system rivals or surpasses existing systems in prediction accuracy, and computes faster than existing systems by up to an order of magnitude.

Related papers

Who is a Better Player: LLM against LLM [53.46608216197315]
We propose an adversarial benchmarking framework to assess the comprehensive performance of Large Language Models (LLMs) through board games competition.<n>We introduce Qi Town, a specialized evaluation platform that supports 5 widely played games and involves 20 LLM-driven players.
arXiv Detail & Related papers (2025-08-05T06:41:47Z)
Re-evaluating Open-ended Evaluation of Large Language Models [50.23008729038318]
We show that the current Elo-based rating systems can be susceptible to and even reinforce biases in data, intentional or accidental. We propose evaluation as a 3-player game, and introduce novel game-theoretic solution concepts to ensure robustness to redundancy.
arXiv Detail & Related papers (2025-02-27T15:07:47Z)
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO [50.58083807719749]
We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions. This competition targets robustness and generalization in multi-agent systems. We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.
arXiv Detail & Related papers (2023-08-30T07:16:11Z)
Behavioral Player Rating in Competitive Online Shooter Games [3.203973145772361]
In this paper, we engineer several features from in-game statistics to model players and create ratings that accurately represent their behavior and true performance level. Our results show that the behavioral ratings present more accurate performance estimations while maintaining the interpretability of the created representations. Considering different aspects of the playing behavior of players and using behavioral ratings for matchmaking can lead to match-ups that are more aligned with players' goals and interests.
arXiv Detail & Related papers (2022-07-01T16:23:01Z)
Collusion Detection in Team-Based Multiplayer Games [57.153233321515984]
We propose a system that detects colluding behaviors in team-based multiplayer games. The proposed method analyzes the players' social relationships paired with their in-game behavioral patterns. We then automate the detection using Isolation Forest, an unsupervised learning technique specialized in highlighting outliers.
arXiv Detail & Related papers (2022-03-10T02:37:39Z)
Learning to Identify Top Elo Ratings: A Dueling Bandits Approach [27.495132915328025]
We propose an efficient online match scheduling algorithm to improve the sample efficiency of the Elo evaluation (for top players) Specifically, we identify and match the top players through a dueling bandits framework and tailor the bandit algorithm to the gradient-based update of Elo. Our algorithm has a regret guarantee $tildeO(sqrtT)$, sublinear in the number of competition rounds and has been extended to the multidimensional Elo ratings.
arXiv Detail & Related papers (2022-01-12T13:57:29Z)
Evaluating Team Skill Aggregation in Online Competitive Games [4.168733556014873]
We present an analysis of the impact of two new aggregation methods on the predictive performance of rating systems. Our evaluations show the superiority of the MAX method over the other two methods in the majority of the tested cases. Results of this study highlight the necessity of devising more elaborated methods for calculating a team's performance.
arXiv Detail & Related papers (2021-06-21T20:17:36Z)
The Evaluation of Rating Systems in Team-based Battle Royale Games [4.168733556014873]
This paper explores the utility of several metrics for evaluating three popular rating systems on a real-world dataset of over 25,000 team battle royale matches. normalized discounted cumulative gain (NDCG) demonstrated more reliable performance and more flexibility.
arXiv Detail & Related papers (2021-05-28T19:22:07Z)
Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent Space [63.57289340402389]
Deep Latent Competition (DLC) is a reinforcement learning algorithm that learns competitive visual control policies through self-play in imagination. Imagined self-play reduces costly sample generation in the real world, while the latent representation enables planning to scale gracefully with observation dimensionality.
arXiv Detail & Related papers (2021-02-19T09:00:29Z)
NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned [122.429985063391]
We describe the motivation and organization of the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers.
arXiv Detail & Related papers (2021-01-01T01:24:34Z)
Interpretable Real-Time Win Prediction for Honor of Kings, a Popular Mobile MOBA Esport [51.20042288437171]
We propose a Two-Stage Spatial-Temporal Network (TSSTN) that can provide accurate real-time win predictions. Experiment results and applications in real-world live streaming scenarios showed that the proposed TSSTN model is effective both in prediction accuracy and interpretability.
arXiv Detail & Related papers (2020-08-14T12:00:58Z)
Competing Bandits: The Perils of Exploration Under Competition [99.68537519404727]
We study the interplay between exploration and competition on online platforms. We find that stark competition induces firms to commit to a "greedy" bandit algorithm that leads to low welfare. We investigate two channels for weakening the competition: relaxing the rationality of users and giving one firm a first-mover advantage.
arXiv Detail & Related papers (2020-07-20T14:19:08Z)
Competitive Balance in Team Sports Games [8.321949054700086]
We show that using final score difference provides yet a better prediction metric for competitive balance. We also show that a linear model trained on a carefully selected set of team and individual features achieves almost the performance of the more powerful neural network model.
arXiv Detail & Related papers (2020-06-24T14:19:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.