Towards Game-Playing AI Benchmarks via Performance Reporting Standards
- URL: http://arxiv.org/abs/2007.02742v1
- Date: Mon, 6 Jul 2020 13:27:00 GMT
- Title: Towards Game-Playing AI Benchmarks via Performance Reporting Standards
- Authors: Vanessa Volz and Boris Naujoks
- Abstract summary: We propose reporting guidelines for AI game-playing performance that, if followed, provide information suitable for unbiased comparisons between different AI approaches.
The vision we describe is to build benchmarks and competitions based on such guidelines in order to draw more general conclusions about the behaviour of different AI algorithms.
- Score: 0.9137554315375919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While games have been used extensively as milestones to evaluate game-playing
AI, there exists no standardised framework for reporting the obtained
observations. As a result, it remains difficult to draw general conclusions
about the strengths and weaknesses of different game-playing AI algorithms. In
this paper, we propose reporting guidelines for AI game-playing performance
that, if followed, provide information suitable for unbiased comparisons
between different AI approaches. The vision we describe is to build benchmarks
and competitions based on such guidelines in order to be able to draw more
general conclusions about the behaviour of different AI algorithms, as well as
the types of challenges different games pose.
Related papers
- Who is a Better Player: LLM against LLM [53.46608216197315]
We propose an adversarial benchmarking framework to assess the comprehensive performance of Large Language Models (LLMs) through board games competition.<n>We introduce Qi Town, a specialized evaluation platform that supports 5 widely played games and involves 20 LLM-driven players.
arXiv Detail & Related papers (2025-08-05T06:41:47Z) - FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory [51.96049148869987]
We present FAIRGAME, a Framework for AI Agents Bias Recognition using Game Theory.
We describe its implementation and usage, and we employ it to uncover biased outcomes in popular games among AI agents.
Overall, FAIRGAME allows users to reliably and easily simulate their desired games and scenarios.
arXiv Detail & Related papers (2025-04-19T15:29:04Z) - Preference-conditioned Pixel-based AI Agent For Game Testing [1.5059676044537105]
Game-testing AI agents that learn by interaction with the environment have the potential to mitigate these challenges.
This paper proposes an agent design that mainly depends on pixel-based state observations while exploring the environment conditioned on a user's preference.
Our agent significantly outperforms state-of-the-art pixel-based game testing agents over exploration coverage and test execution quality when evaluated on a complex open-world environment resembling many aspects of real AAA games.
arXiv Detail & Related papers (2023-08-18T04:19:36Z) - SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM)
In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment.
Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z) - WinoGAViL: Gamified Association Benchmark to Challenge
Vision-and-Language Models [91.92346150646007]
In this work, we introduce WinoGAViL: an online game to collect vision-and-language associations.
We use the game to collect 3.5K instances, finding that they are intuitive for humans but challenging for state-of-the-art AI models.
Our analysis as well as the feedback we collect from players indicate that the collected associations require diverse reasoning skills.
arXiv Detail & Related papers (2022-07-25T23:57:44Z) - Towards Objective Metrics for Procedurally Generated Video Game Levels [2.320417845168326]
We introduce two simulation-based evaluation metrics to measure the diversity and difficulty of generated levels.
We demonstrate that our diversity metric is more robust to changes in level size and representation than current methods.
The difficulty metric shows promise, as it correlates with existing estimates of difficulty in one of the tested domains, but it does face some challenges in the other domain.
arXiv Detail & Related papers (2022-01-25T14:13:50Z) - Spatial State-Action Features for General Games [5.849736173068868]
We formulate a design and efficient implementation of spatial state-action features for general games.
These are patterns that can be trained to incentivise or disincentivise actions based on whether or not they match variables of the state in a local area.
We propose an efficient approach for evaluating active features for any given set of features.
arXiv Detail & Related papers (2022-01-17T13:34:04Z) - CommonsenseQA 2.0: Exposing the Limits of AI through Gamification [126.85096257968414]
We construct benchmarks that test the abilities of modern natural language understanding models.
In this work, we propose gamification as a framework for data construction.
arXiv Detail & Related papers (2022-01-14T06:49:15Z) - Revisiting Game Representations: The Hidden Costs of Efficiency in
Sequential Decision-making Algorithms [0.6749750044497732]
Recent advancements in algorithms for sequential decision-making under imperfect information have shown remarkable success in large games.
These algorithms traditionally formalize the games using the extensive-form game formalism.
We show that a popular workaround involves using a specialized representation based on player specific information-state trees.
arXiv Detail & Related papers (2021-12-20T22:34:19Z) - Contextual Games: Multi-Agent Learning with Side Information [57.76996806603094]
We formulate the novel class of contextual games driven by contextual information at each round.
By means of kernel-based regularity assumptions, we model the correlation between different contexts and game outcomes.
We propose a novel online (meta) algorithm that exploits such correlations to minimize the contextual regret of individual players.
arXiv Detail & Related papers (2021-07-13T18:37:37Z) - Rinascimento: searching the behaviour space of Splendor [0.0]
This research is to map the behavioural space (BSpace) of a game by using a general method.
In particular, the use of event-value functions has generally shown a remarkable improvement in the coverage of the BSpace compared to agents based on classic score-based reward signals.
arXiv Detail & Related papers (2021-06-15T18:46:57Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - An Empirical Study on the Generalization Power of Neural Representations
Learned via Visual Guessing Games [79.23847247132345]
This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA)
We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL)
arXiv Detail & Related papers (2021-01-31T10:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.