Related papers: From Gameplay Traces to Game Mechanics: Causal Induction with Large Language Models

From Gameplay Traces to Game Mechanics: Causal Induction with Large Language Models

URL: http://arxiv.org/abs/2602.00190v1
Date: Fri, 30 Jan 2026 08:48:23 GMT
Title: From Gameplay Traces to Game Mechanics: Causal Induction with Large Language Models
Authors: Mohit Jiwatode, Alexander Dockhorn, Bodo Rosenhahn,
Abstract summary: We investigate Causal Induction: the ability to infer governing laws from observational data.<n>We compare two approaches to VGDL generation: direct code generation from observations, and a two-stage method that first infers a structural causal model (SCM) and then translates it into VGDL.<n>Results show that the SCM-based approach more often produces VGDL descriptions closer to the ground truth than direct generation.
Score: 64.43268969806098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning agents can achieve high performance in complex game domains without often understanding the underlying causal game mechanics. To address this, we investigate Causal Induction: the ability to infer governing laws from observational data, by tasking Large Language Models (LLMs) with reverse-engineering Video Game Description Language (VGDL) rules from gameplay traces. To reduce redundancy, we select nine representative games from the General Video Game AI (GVGAI) framework using semantic embeddings and clustering. We compare two approaches to VGDL generation: direct code generation from observations, and a two-stage method that first infers a structural causal model (SCM) and then translates it into VGDL. Both approaches are evaluated across multiple prompting strategies and controlled context regimes, varying the amount and form of information provided to the model, from just raw gameplay observations to partial VGDL specifications. Results show that the SCM-based approach more often produces VGDL descriptions closer to the ground truth than direct generation, achieving preference win rates of up to 81\% in blind evaluations and yielding fewer logically inconsistent rules. These learned SCMs can be used for downstream use cases such as causal reinforcement learning, interpretable agents, and procedurally generating novel but logically consistent games.

Related papers

Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models [56.851611990473174]
Reasoning over dynamic visual content remains a central challenge for large language models.<n>We propose a reinforcement learning approach that enhances both temporal precision and reasoning consistency.<n>The resulting model, Video R2, achieves consistently higher TAC, VAS, and accuracy across multiple benchmarks.
arXiv Detail & Related papers (2025-11-28T18:59:58Z)
Code World Models for General Game Playing [22.382021070682256]
We use the Large Language Models to translate natural language rules and game trajectories into a formal, executable world model represented as Python code.<n>This generated model serves as a verifiable simulation engine for high-performance planning algorithms.<n>We find that our method outperforms or matches Gemini 2.5 Pro in 9 out of the 10 considered games.
arXiv Detail & Related papers (2025-10-06T07:16:07Z)
Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs [63.82840470917859]
We show that the decoding mechanism of dLLMs can be used as a powerful tool for model attribution.<n>We propose a novel information extraction scheme called the Directed Decoding Map (DDM), which captures structural relationships between decoding steps and better reveals model-specific behaviors.
arXiv Detail & Related papers (2025-10-02T06:25:10Z)
GVGAI-LLM: Evaluating Large Language Model Agents with Infinite Games [7.594173359523366]
We introduce GVGAI-LLM, a video game benchmark for evaluating the reasoning and problem-solving capabilities of large language models (LLMs)<n>Built on the General Video Game AI framework, it features a diverse collection of arcade-style games designed to test a model's ability to handle tasks that differ from most existing LLM benchmarks.
arXiv Detail & Related papers (2025-08-11T22:17:07Z)
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning [89.93384726755106]
Vision-language reinforcement learning (RL) has primarily focused on narrow domains.<n>We find video games inherently provide rich visual elements and mechanics that are easy to verify.<n>To fully use the multimodal and verifiable reward in video games, we propose Game-RL.
arXiv Detail & Related papers (2025-05-20T03:47:44Z)
Grammar and Gameplay-aligned RL for Game Description Generation with LLMs [12.329521804287259]
Game Description Generation (GDG) is the task of generating a game description written in a Game Description Language (GDL) from natural language text.<n>We propose reinforcement learning-based fine-tuning of Large Language Models for GDG (RLGDG)<n>Our training method simultaneously improves grammatical correctness and fidelity to game concepts.
arXiv Detail & Related papers (2025-03-20T01:47:33Z)
Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models [21.48544455321618]
Video Anomaly Detection is crucial for applications such as security surveillance and autonomous driving. Existing VAD methods provide little rationale behind detection, hindering public trust in real-world deployments. We propose AnomalyRuler, a rule-based reasoning framework for VAD with Large Language Models.
arXiv Detail & Related papers (2024-07-14T19:23:12Z)
Harnessing Large Language Models for Training-free Video Anomaly Detection [34.76811491190446]
Video anomaly detection (VAD) aims to temporally locate abnormal events in a video. Training-based methods are prone to be domain-specific, thus being costly for practical deployment. We propose LAnguage-based VAD (LAVAD), a method tackling VAD in a novel, training-free paradigm.
arXiv Detail & Related papers (2024-04-01T09:34:55Z)
DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation [57.07295906718989]
Constrained decoding approaches aim to control the meaning or style of text generated by pre-trained large language (Ms also PLMs) for various tasks at inference time.<n>These methods often guide plausible continuations by greedily and explicitly selecting targets.<n>Inspired by cognitive dual-process theory, we propose a novel decoding framework DECIDER.
arXiv Detail & Related papers (2024-03-04T11:49:08Z)
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions. Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z)
Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem. We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training. Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.