Elo Ratings for Large Tournaments of Software Agents in Asymmetric Games
- URL: http://arxiv.org/abs/2105.00839v1
- Date: Fri, 23 Apr 2021 21:49:20 GMT
- Title: Elo Ratings for Large Tournaments of Software Agents in Asymmetric Games
- Authors: Ben Wise
- Abstract summary: It is natural to evaluate artificial intelligence agents on the same Elo scale as humans, such as the rating of 5185 attributed to AlphaGo Zero.
There are several fundamental differences between humans and AI that suggest modifications to the system.
We present a revised rating system, and guidelines for tournaments, to reflect these differences.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The Elo rating system has been used world wide for individual sports and team
sports, as exemplified by the European Go Federation (EGF), International Chess
Federation (FIDE), International Federation of Association Football (FIFA), and
many others. To evaluate the performance of artificial intelligence agents, it
is natural to evaluate them on the same Elo scale as humans, such as the rating
of 5185 attributed to AlphaGo Zero.
There are several fundamental differences between humans and AI that suggest
modifications to the system, which in turn require revisiting Elo's fundamental
rationale. AI is typically trained on many more games than humans play, and we
have little a-priori information on newly created AI agents. Further, AI is
being extended into games which are asymmetric between the players, and which
could even have large complex boards with different setup in every game, such
as commercial paper strategy games. We present a revised rating system, and
guidelines for tournaments, to reflect these differences.
Related papers
- Human-aligned Chess with a Bit of Search [35.16633353273246]
Chess has long been a testbed for AI's quest to match human intelligence.
In this paper, we introduce Allie, a chess-playing AI designed to bridge the gap between artificial and human intelligence in this classic game.
arXiv Detail & Related papers (2024-10-04T19:51:03Z) - DanZero+: Dominating the GuanDan Game through Reinforcement Learning [95.90682269990705]
We develop an AI program for an exceptionally complex and popular card game called GuanDan.
We first put forward an AI program named DanZero for this game.
In order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan.
arXiv Detail & Related papers (2023-12-05T08:07:32Z) - Diversifying AI: Towards Creative Chess with AlphaZero [22.169342583475938]
We study whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones.
Our experiments suggest that AZ_db plays chess in diverse ways, solves more puzzles as a group and outperforms a more homogeneous team.
Our findings suggest that diversity bonuses emerge in teams of AI agents, just as they do in teams of humans.
arXiv Detail & Related papers (2023-08-17T20:27:33Z) - DanZero: Mastering GuanDan Game with Reinforcement Learning [121.93690719186412]
Card game AI has always been a hot topic in the research of artificial intelligence.
In this paper, we are devoted to developing an AI program for a more complex card game, GuanDan.
We propose the first AI program DanZero for GuanDan using reinforcement learning technique.
arXiv Detail & Related papers (2022-10-31T06:29:08Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Learning to Identify Top Elo Ratings: A Dueling Bandits Approach [27.495132915328025]
We propose an efficient online match scheduling algorithm to improve the sample efficiency of the Elo evaluation (for top players)
Specifically, we identify and match the top players through a dueling bandits framework and tailor the bandit algorithm to the gradient-based update of Elo.
Our algorithm has a regret guarantee $tildeO(sqrtT)$, sublinear in the number of competition rounds and has been extended to the multidimensional Elo ratings.
arXiv Detail & Related papers (2022-01-12T13:57:29Z) - Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi [0.0]
We evaluate teams of humans and AI agents in the cooperative card game emphHanabi with both rule-based and learning-based agents.
We find that humans have a clear preference toward a rule-based AI teammate over a state-of-the-art learning-based AI teammate.
arXiv Detail & Related papers (2021-07-15T22:19:15Z) - ELO System for Skat and Other Games of Chance [1.3706331473063877]
The evaluation of player strength in trick-taking card games like Skat or Bridge is not obvious.
We propose a new ELO system for Skat to overcome these weaknesses.
arXiv Detail & Related papers (2021-04-07T08:30:01Z) - Game Plan: What AI can do for Football, and What Football can do for AI [83.79507996785838]
Predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision.
We illustrate that football analytics is a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI.
arXiv Detail & Related papers (2020-11-18T10:26:02Z) - Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork [54.309495231017344]
We argue that AI systems should be trained in a human-centered manner, directly optimized for team performance.
We study this proposal for a specific type of human-AI teaming, where the human overseer chooses to either accept the AI recommendation or solve the task themselves.
Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the most accuracy AI may not lead to highest team performance.
arXiv Detail & Related papers (2020-04-27T19:06:28Z) - Suphx: Mastering Mahjong with Deep Reinforcement Learning [114.68233321904623]
We design an AI for Mahjong, named Suphx, based on deep reinforcement learning with some newly introduced techniques.
Suphx has demonstrated stronger performance than most top human players in terms of stable rank.
This is the first time that a computer program outperforms most top human players in Mahjong.
arXiv Detail & Related papers (2020-03-30T16:18:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.