Related papers: Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting

Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting

URL: http://arxiv.org/abs/2511.02534v1
Date: Tue, 04 Nov 2025 12:40:46 GMT
Title: Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting
Authors: Enhong Mu, Jinyu Cai, Yijun Lu, Mingyue Zhang, Kenji Tei, Jialong Li,
Abstract summary: This paper proposes a KLPEG framework to conduct precise and efficient testing tailored for incremental game updates.<n>The framework constructs and maintains a Knowledge Graph (KG) to systematically model game elements, task dependencies, and causal relationships.<n> Experiments in two representative game environments, Overcooked and Minecraft, demonstrate that KLPEG can more accurately locate functionalities affected by updates.
Score: 10.112811020571774
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The rapid iteration and frequent updates of modern video games pose significant challenges to the efficiency and specificity of testing. Although automated playtesting methods based on Large Language Models (LLMs) have shown promise, they often lack structured knowledge accumulation mechanisms, making it difficult to conduct precise and efficient testing tailored for incremental game updates. To address this challenge, this paper proposes a KLPEG framework. The framework constructs and maintains a Knowledge Graph (KG) to systematically model game elements, task dependencies, and causal relationships, enabling knowledge accumulation and reuse across versions. Building on this foundation, the framework utilizes LLMs to parse natural language update logs, identify the scope of impact through multi-hop reasoning on the KG, enabling the generation of update-tailored test cases. Experiments in two representative game environments, Overcooked and Minecraft, demonstrate that KLPEG can more accurately locate functionalities affected by updates and complete tests in fewer steps, significantly improving both playtesting effectiveness and efficiency.

Related papers

How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing [56.60465182650588]
We introduce three-level interaction hierarchy that captures deictic grounding, morphological manipulation, and causal reasoning.<n>We propose a robust LMM-as-a-judge evaluation framework with task-specific metrics to enable scalable and fine-grained assessment.<n>We find that proprietary models exhibit early-stage visual instruction-following capabilities and consistently outperform open-source models.
arXiv Detail & Related papers (2026-02-02T09:24:45Z)
Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning [4.3706127838450035]
"Games as a Service" model requires frequent content updates.<n>Code-centric methods focus on structural coverage without understanding gameplay context.<n>We propose SMART, a novel framework that synergizes structural verification and functional validation for game update testing.
arXiv Detail & Related papers (2025-12-14T14:18:18Z)
PuzzlePlex: Benchmarking Foundation Models on Reasoning and Planning with Puzzles [53.47227295854126]
This work investigates the reasoning and planning capabilities of foundation models and their scalability in complex, dynamic environments.<n>We introduce PuzzlePlex, a benchmark designed to assess these capabilities through a diverse set of puzzles.
arXiv Detail & Related papers (2025-10-07T21:24:29Z)
Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates [56.73907811047611]
Large language models (LLMs) have demonstrated strong reasoning and tool-use capabilities.<n>LLMs often fail in real-world tool-interactions due to incorrect parameterization, poor tool selection, or misinterpretation of user intent.<n>We introduce a curriculum-inspired framework that leverages structured reasoning templates to guide LLMs through more deliberate step-by-step instructions for generating function callings.
arXiv Detail & Related papers (2025-09-22T17:55:14Z)
Enhancing Large Language Model for Knowledge Graph Completion via Structure-Aware Alignment-Tuning [52.78024385391959]
Knowledge graph completion (KGC) aims to infer new knowledge and make predictions from knowledge graphs.<n>Existing methods often ignore the inconsistent representation spaces between natural language and graph structures.<n>We propose SAT, a novel framework that enhances LLMs for KGC via structure-aware alignment-tuning.
arXiv Detail & Related papers (2025-09-01T06:38:11Z)
GVGAI-LLM: Evaluating Large Language Model Agents with Infinite Games [8.640618631999173]
We introduce GVGAI-LLM, a video game benchmark for evaluating the reasoning and problem-solving capabilities of large language models (LLMs)<n>Built on the General Video Game AI framework, it features a diverse collection of arcade-style games designed to test a model's ability to handle tasks that differ from most existing LLM benchmarks.
arXiv Detail & Related papers (2025-08-11T22:17:07Z)
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search [53.40810298627443]
ReGUIDE is a framework for web grounding that enables MLLMs to learn data efficiently through self-generated reasoning and spatial-aware criticism.<n>Our experiments demonstrate that ReGUIDE significantly advances web grounding performance across multiple benchmarks.
arXiv Detail & Related papers (2025-05-21T08:36:18Z)
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation [80.69067017594709]
Large language models (LLMs) and their agentic counterparts struggle to retain reasoning from previous tasks.<n>We propose a novel framework, log-augmented generation (LAG) that directly reuses prior computation and reasoning from past logs at test time.<n>Our method significantly outperforms standard agentic systems that do not utilize logs.
arXiv Detail & Related papers (2025-05-20T14:14:38Z)
Knowledge Editing through Chain-of-Thought [31.230769348268282]
In-context editing is a technique that updates large language models (LLMs) with new information to maintain their world knowledge.<n>Despite its potential, existing in-context knowledge editing methods are often task-specific.<n>We propose EditCoT, a novel knowledge editing framework that flexibly and efficiently updates LLMs across various tasks without retraining.
arXiv Detail & Related papers (2024-12-23T17:17:50Z)
LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments [35.3938477255058]
This paper introduces Graph Memory-based Editing for Large Language Models (GMeLLo)<n>It is a straightforward and effective method that merges the explicit knowledge representation of Knowledge Graphs with the linguistic flexibility of Large Language Models.<n>Our results show that GMeLLo significantly surpasses current state-of-the-art knowledge editing methods in the multi-hop question answering benchmark, MQuAKE.
arXiv Detail & Related papers (2024-08-28T16:15:45Z)
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents [19.989503513817095]
Large Language Models can be prompted to "self-play" conversational games that probe certain capabilities. We take one of the proposed frameworks for setting up such game-play environments, and test its usefulness as an evaluation instrument.
arXiv Detail & Related papers (2024-05-31T14:43:31Z)
Comparative Code Structure Analysis using Deep Learning for Performance Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure. Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.