The Evaluation of Rating Systems in Team-based Battle Royale Games
- URL: http://arxiv.org/abs/2105.14069v1
- Date: Fri, 28 May 2021 19:22:07 GMT
- Title: The Evaluation of Rating Systems in Team-based Battle Royale Games
- Authors: Arman Dehpanah, Muheeb Faizan Ghori, Jonathan Gemmell, Bamshad
Mobasher
- Abstract summary: This paper explores the utility of several metrics for evaluating three popular rating systems on a real-world dataset of over 25,000 team battle royale matches.
normalized discounted cumulative gain (NDCG) demonstrated more reliable performance and more flexibility.
- Score: 4.168733556014873
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online competitive games have become a mainstream entertainment platform. To
create a fair and exciting experience, these games use rating systems to match
players with similar skills. While there has been an increasing amount of
research on improving the performance of these systems, less attention has been
paid to how their performance is evaluated. In this paper, we explore the
utility of several metrics for evaluating three popular rating systems on a
real-world dataset of over 25,000 team battle royale matches. Our results
suggest considerable differences in their evaluation patterns. Some metrics
were highly impacted by the inclusion of new players. Many could not capture
the real differences between certain groups of players. Among all metrics
studied, normalized discounted cumulative gain (NDCG) demonstrated more
reliable performance and more flexibility. It alleviated most of the challenges
faced by the other metrics while adding the freedom to adjust the focus of the
evaluations on different groups of players.
Related papers
- CUPID: Improving Battle Fairness and Position Satisfaction in Online MOBA Games with a Re-matchmaking System [38.36310386543932]
CUPID aims to optimize team and position assignments to improve both fairness and player satisfaction.
It incorporates a pre-filtering step to ensure a minimum level of matchmaking quality, followed by a pre-match win-rate prediction model.
Experiments were conducted on two large-scale, real-world MOBA datasets to validate the effectiveness of CUPID.
arXiv Detail & Related papers (2024-06-28T08:09:55Z) - Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - GameEval: Evaluating LLMs on Conversational Games [93.40433639746331]
We propose GameEval, a novel approach to evaluating large language models (LLMs)
GameEval treats LLMs as game players and assigns them distinct roles with specific goals achieved by launching conversations of various forms.
We show that GameEval can effectively differentiate the capabilities of various LLMs, providing a comprehensive assessment of their integrated abilities to solve complex problems.
arXiv Detail & Related papers (2023-08-19T14:33:40Z) - Behavioral Player Rating in Competitive Online Shooter Games [3.203973145772361]
In this paper, we engineer several features from in-game statistics to model players and create ratings that accurately represent their behavior and true performance level.
Our results show that the behavioral ratings present more accurate performance estimations while maintaining the interpretability of the created representations.
Considering different aspects of the playing behavior of players and using behavioral ratings for matchmaking can lead to match-ups that are more aligned with players' goals and interests.
arXiv Detail & Related papers (2022-07-01T16:23:01Z) - Collusion Detection in Team-Based Multiplayer Games [57.153233321515984]
We propose a system that detects colluding behaviors in team-based multiplayer games.
The proposed method analyzes the players' social relationships paired with their in-game behavioral patterns.
We then automate the detection using Isolation Forest, an unsupervised learning technique specialized in highlighting outliers.
arXiv Detail & Related papers (2022-03-10T02:37:39Z) - Player Modeling using Behavioral Signals in Competitive Online Games [4.168733556014873]
This paper focuses on the importance of addressing different aspects of playing behavior when modeling players for creating match-ups.
We engineer several behavioral features from a dataset of over 75,000 battle royale matches and create player models.
We then use the created models to predict ranks for different groups of players in the data.
arXiv Detail & Related papers (2021-11-29T22:53:17Z) - Evaluating Team Skill Aggregation in Online Competitive Games [4.168733556014873]
We present an analysis of the impact of two new aggregation methods on the predictive performance of rating systems.
Our evaluations show the superiority of the MAX method over the other two methods in the majority of the tested cases.
Results of this study highlight the necessity of devising more elaborated methods for calculating a team's performance.
arXiv Detail & Related papers (2021-06-21T20:17:36Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - An Empirical Study on the Generalization Power of Neural Representations
Learned via Visual Guessing Games [79.23847247132345]
This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA)
We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL)
arXiv Detail & Related papers (2021-01-31T10:30:48Z) - An Elo-like System for Massive Multiplayer Competitions [1.8782750537161612]
We present a novel Bayesian rating system for contests with many participants.
It is widely applicable to competition formats with discrete ranked matches.
We show that the system aligns incentives: that is, a player who seeks to maximize their rating will never want to underperform.
arXiv Detail & Related papers (2021-01-02T08:14:31Z) - Incorporating Rivalry in Reinforcement Learning for a Competitive Game [65.2200847818153]
This study focuses on providing a novel learning mechanism based on a rivalry social impact.
Based on the concept of competitive rivalry, our analysis aims to investigate if we can change the assessment of these agents from a human perspective.
arXiv Detail & Related papers (2020-11-02T21:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.