Ordinal Regression for Difficulty Estimation of StepMania Levels
- URL: http://arxiv.org/abs/2301.09485v1
- Date: Mon, 23 Jan 2023 15:30:01 GMT
- Title: Ordinal Regression for Difficulty Estimation of StepMania Levels
- Authors: Billy Joe Franks, Benjamin Dinkelmann, Sophie Fellenz and Marius Kloft
- Abstract summary: We formalize and analyze the difficulty prediction task on StepMania levels as an ordinal regression (OR) task.
We evaluate many competitive OR and non-OR models, demonstrating that neural network-based models significantly outperform the state of the art.
We conclude with a user experiment showing our trained models' superiority over human labeling.
- Score: 18.944506234623862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: StepMania is a popular open-source clone of a rhythm-based video game. As is
common in popular games, there is a large number of community-designed levels.
It is often difficult for players and level authors to determine the difficulty
level of such community contributions. In this work, we formalize and analyze
the difficulty prediction task on StepMania levels as an ordinal regression
(OR) task. We standardize a more extensive and diverse selection of this data
resulting in five data sets, two of which are extensions of previous work. We
evaluate many competitive OR and non-OR models, demonstrating that neural
network-based models significantly outperform the state of the art and that
StepMania-level data makes for an excellent test bed for deep OR models. We
conclude with a user experiment showing our trained models' superiority over
human labeling.
Related papers
- LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.
LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.
We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - Improving Conditional Level Generation using Automated Validation in Match-3 Games [39.887603099741696]
This paper proposes Avalon, a novel method to improve models that learn from existing level designs.
We use a conditional variational autoencoder to generate layouts for match-3 levels.
We quantitatively evaluate our approach by comparing it to an ablated model without difficulty conditioning.
arXiv Detail & Related papers (2024-09-10T09:07:47Z) - Difficulty Modelling in Mobile Puzzle Games: An Empirical Study on
Different Methods to Combine Player Analytics and Simulated Data [0.0]
A common practice consists of creating metrics out of data collected by player interactions with the content.
This allows for estimation only after the content is released and does not consider the characteristics of potential future players.
In this article, we present a number of potential solutions for the estimation of difficulty under such conditions.
arXiv Detail & Related papers (2024-01-30T20:51:42Z) - The Unreasonable Effectiveness of Easy Training Data for Hard Tasks [84.30018805150607]
We present the surprising conclusion that current pretrained language models often generalize relatively well from easy to hard data.
We demonstrate this kind of easy-to-hard generalization using simple finetuning methods like in-context learning, linear heads, and QLoRA.
We conclude that easy-to-hard generalization in LMs is surprisingly strong for the tasks studied.
arXiv Detail & Related papers (2024-01-12T18:36:29Z) - Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice.
We investigate whether we can go beyond human data on tasks where we have access to scalar feedback.
We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z) - Inverse Scaling: When Bigger Isn't Better [80.42834197416444]
Large language models (LMs) show predictable improvements to overall loss with increased scale.
We present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale.
arXiv Detail & Related papers (2023-06-15T20:11:23Z) - Personalized Game Difficulty Prediction Using Factorization Machines [0.9558392439655011]
We contribute a new approach for personalized difficulty estimation of game levels, borrowing methods from content recommendation.
We are able to predict difficulty as the number of attempts a player requires to pass future game levels, based on observed attempt counts from earlier levels and levels played by others.
Our results suggest that FMs are a promising tool enabling game designers to both optimize player experience and learn more about their players and the game.
arXiv Detail & Related papers (2022-09-06T08:03:46Z) - Towards Objective Metrics for Procedurally Generated Video Game Levels [2.320417845168326]
We introduce two simulation-based evaluation metrics to measure the diversity and difficulty of generated levels.
We demonstrate that our diversity metric is more robust to changes in level size and representation than current methods.
The difficulty metric shows promise, as it correlates with existing estimates of difficulty in one of the tested domains, but it does face some challenges in the other domain.
arXiv Detail & Related papers (2022-01-25T14:13:50Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Style Curriculum Learning for Robust Medical Image Segmentation [62.02435329931057]
Deep segmentation models often degrade due to distribution shifts in image intensities between the training and test data sets.
We propose a novel framework to ensure robust segmentation in the presence of such distribution shifts.
arXiv Detail & Related papers (2021-08-01T08:56:24Z) - Statistical Modelling of Level Difficulty in Puzzle Games [0.0]
We formalise a model of level difficulty for puzzle games that goes beyond the classical probability of success.
The model is fitted and evaluated on a dataset collected from the game Lily's Garden by Tactile Games.
arXiv Detail & Related papers (2021-07-05T13:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.