Related papers: Towards Objective Metrics for Procedurally Generated Video Game Levels

Towards Objective Metrics for Procedurally Generated Video Game Levels

URL: http://arxiv.org/abs/2201.10334v1
Date: Tue, 25 Jan 2022 14:13:50 GMT
Title: Towards Objective Metrics for Procedurally Generated Video Game Levels
Authors: Michael Beukman, Steven James and Christopher Cleghorn
Abstract summary: We introduce two simulation-based evaluation metrics to measure the diversity and difficulty of generated levels. We demonstrate that our diversity metric is more robust to changes in level size and representation than current methods. The difficulty metric shows promise, as it correlates with existing estimates of difficulty in one of the tested domains, but it does face some challenges in the other domain.
Score: 2.320417845168326
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With increasing interest in procedural content generation by academia and game developers alike, it is vital that different approaches can be compared fairly. However, evaluating procedurally generated video game levels is often difficult, due to the lack of standardised, game-independent metrics. In this paper, we introduce two simulation-based evaluation metrics that involve analysing the behaviour of an A* agent to measure the diversity and difficulty of generated levels in a general, game-independent manner. Diversity is calculated by comparing action trajectories from different levels using the edit distance, and difficulty is measured as how much exploration and expansion of the A* search tree is necessary before the agent can solve the level. We demonstrate that our diversity metric is more robust to changes in level size and representation than current methods and additionally measures factors that directly affect playability, instead of focusing on visual information. The difficulty metric shows promise, as it correlates with existing estimates of difficulty in one of the tested domains, but it does face some challenges in the other domain. Finally, to promote reproducibility, we publicly release our evaluation framework.

Related papers

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks [229.73714829399802]
This survey probes the core challenges that the rise of Large Language Models poses for evaluation. We identify and analyze two pivotal transitions: (i) from task-specific to capability-based evaluation, which reorganizes benchmarks around core competencies such as knowledge, reasoning, instruction following, multi-modal understanding, and safety. We will dissect this issue, along with the core challenges of the above two transitions, from the perspectives of methods, datasets, evaluators, and metrics.
arXiv Detail & Related papers (2025-04-26T07:48:52Z)
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models [84.27290155010533]
V-MAGE is a game-based evaluation framework designed to assess visual reasoning capabilities of MLLMs. We use V-MAGE to evaluate leading MLLMs, revealing significant challenges in their visual perception and reasoning.
arXiv Detail & Related papers (2025-04-08T15:43:01Z)
Perceptual Similarity for Measuring Decision-Making Style and Policy Diversity in Games [28.289135305943056]
Defining and measuring decision-making styles, also known as playstyles, is crucial in gaming. We introduce three enhancements to increase accuracy: multiscale analysis with varied state psychology, a perceptual kernel rooted in granularity, and the utilization of the intersection-over-union method for efficient evaluation. Our findings improve the measurement of end-to-end game analysis and the evolution of artificial intelligence for diverse playstyles.
arXiv Detail & Related papers (2024-08-12T10:55:42Z)
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation [76.67608003501479]
We introduce and specify an evaluation protocol defining a range of domain-related metrics computed on the basics of the primary evaluation indicators. The results of such a comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.
arXiv Detail & Related papers (2024-07-20T16:37:21Z)
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
Preference-conditioned Pixel-based AI Agent For Game Testing [1.5059676044537105]
Game-testing AI agents that learn by interaction with the environment have the potential to mitigate these challenges. This paper proposes an agent design that mainly depends on pixel-based state observations while exploring the environment conditioned on a user's preference. Our agent significantly outperforms state-of-the-art pixel-based game testing agents over exploration coverage and test execution quality when evaluated on a complex open-world environment resembling many aspects of real AAA games.
arXiv Detail & Related papers (2023-08-18T04:19:36Z)
Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search [66.95134080902717]
We propose a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL) We introduce a Multi-scale Exemplar Branch to guide the network in concentrating on the foreground and learning scale-invariant features. Experiments on PRW and CUHK-SYSU databases demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-02-25T04:48:11Z)
Ordinal Regression for Difficulty Estimation of StepMania Levels [18.944506234623862]
We formalize and analyze the difficulty prediction task on StepMania levels as an ordinal regression (OR) task. We evaluate many competitive OR and non-OR models, demonstrating that neural network-based models significantly outperform the state of the art. We conclude with a user experiment showing our trained models' superiority over human labeling.
arXiv Detail & Related papers (2023-01-23T15:30:01Z)
Generating Game Levels of Diverse Behaviour Engagement [2.5739833468005595]
Experimental studies on emphSuper Mario Bros. indicate that using the same evaluation metrics but agents with different personas can generate levels for particular persona. It implies that, for simple games, using a game-playing agent of specific player archetype as a level tester is probably all we need to generate levels of diverse behaviour engagement.
arXiv Detail & Related papers (2022-07-05T15:08:12Z)
Modeling Content Creator Incentives on Algorithm-Curated Platforms [76.53541575455978]
We study how algorithmic choices affect the existence and character of (Nash) equilibria in exposure games. We propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets.
arXiv Detail & Related papers (2022-06-27T08:16:59Z)
Procedural Content Generation using Neuroevolution and Novelty Search for Diverse Video Game Levels [2.320417845168326]
Procedurally generated video game content has the potential to drastically reduce the content creation budget of game developers and large studios. However, adoption is hindered by limitations such as slow generation, as well as low quality and diversity of content. We introduce an evolutionary search-based approach for evolving level generators using novelty search to procedurally generate diverse levels in real time.
arXiv Detail & Related papers (2022-04-14T12:54:32Z)
Uncertainty-aware Score Distribution Learning for Action Quality Assessment [91.05846506274881]
We propose an uncertainty-aware score distribution learning (USDL) approach for action quality assessment (AQA) Specifically, we regard an action as an instance associated with a score distribution, which describes the probability of different evaluated scores. Under the circumstance where fine-grained score labels are available, we devise a multi-path uncertainty-aware score distributions learning (MUSDL) method to explore the disentangled components of a score.
arXiv Detail & Related papers (2020-06-13T15:41:29Z)
Towards Universal Representation Learning for Deep Face Recognition [106.21744671876704]
We propose a universal representation learning framework that can deal with larger variation unseen in the given training data without leveraging target domain knowledge. Experiments show that our method achieves top performance on general face recognition datasets such as LFW and MegaFace.
arXiv Detail & Related papers (2020-02-26T23:29:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.