Related papers: Real-Time World Crafting: Generating Structured Game Behaviors from Natural Language with Large Language Models

Real-Time World Crafting: Generating Structured Game Behaviors from Natural Language with Large Language Models

URL: http://arxiv.org/abs/2510.16952v1
Date: Sun, 19 Oct 2025 18:09:44 GMT
Title: Real-Time World Crafting: Generating Structured Game Behaviors from Natural Language with Large Language Models
Authors: Austin Drake, Hang Dong,
Abstract summary: We present a novel architecture for safely integrating Large Language Models into interactive game engines.<n>Our framework mitigates risks by using an LLM to translate commands into a constrained Domain-Specific Language.<n>We evaluate this system in a 2D spell-crafting game prototype.
Score: 0.8869777013253825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a novel architecture for safely integrating Large Language Models (LLMs) into interactive game engines, allowing players to "program" new behaviors using natural language. Our framework mitigates risks by using an LLM to translate commands into a constrained Domain-Specific Language (DSL), which configures a custom Entity-Component-System (ECS) at runtime. We evaluated this system in a 2D spell-crafting game prototype by experimentally assessing models from the Gemini, GPT, and Claude families with various prompting strategies. A validated LLM judge qualitatively rated the outputs, showing that while larger models better captured creative intent, the optimal prompting strategy is task-dependent: Chain-of-Thought improved creative alignment, while few-shot examples were necessary to generate more complex DSL scripts. This work offers a validated LLM-ECS pattern for emergent gameplay and a quantitative performance comparison for developers.

Related papers

Not All Tokens Matter: Data-Centric Optimization for Efficient Code Summarization [46.365359894614706]
We evaluate how system prompts affect ILMs and CLMs in code generation tasks.<n>Our evaluation framework, spanning 120 model configurations, reveals that the influence of system prompts increases with model scale.<n>Java shows greater sensitivity to system prompt variations than Python.
arXiv Detail & Related papers (2026-01-28T00:45:28Z)
Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks [22.908904483320953]
Large Language Models (LLMs) in coding tasks are often a reflection of their extensive pre-training corpora.<n>We propose ILA-agent, a general ILA framework that equips LLMs with a set of behavioral primitives.<n>We instantiate ILA-agent for Cangjie and evaluate its performance across code generation, translation, and program repair tasks.
arXiv Detail & Related papers (2026-01-16T09:06:47Z)
Automated Unity Game Template Generation from GDDs via NLP and Multi-Modal LLMs [0.0]
This paper presents a novel framework for automated game template generation using Natural Language Processing (NLP) and multi-modal Large Language Models (LLMs)<n>We introduce an end-to-end system that parses Game Design Documents (GDDs) and extracts structured game specifications.<n>We synthesizes Unity-compatible C# code that implements the core mechanics, systems, and architecture defined in the design documentation.
arXiv Detail & Related papers (2025-09-07T21:53:37Z)
GVGAI-LLM: Evaluating Large Language Model Agents with Infinite Games [8.640618631999173]
We introduce GVGAI-LLM, a video game benchmark for evaluating the reasoning and problem-solving capabilities of large language models (LLMs)<n>Built on the General Video Game AI framework, it features a diverse collection of arcade-style games designed to test a model's ability to handle tasks that differ from most existing LLM benchmarks.
arXiv Detail & Related papers (2025-08-11T22:17:07Z)
Monte Carlo Planning with Large Language Model for Text-Based Game Agents [27.385517721352368]
We introduce the Monte Carlo planning with Dynamic Memory-guided Large language model (MC-DML) algorithm.<n>MC-DML leverages the language understanding and reasoning capabilities of Large Language Models (LLMs) alongside the exploratory advantages of tree search algorithms.<n>Our results demonstrate that the MC-DML algorithm significantly enhances performance across various games at the initial planning phase.
arXiv Detail & Related papers (2025-04-23T16:23:15Z)
Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash [6.65572931991284]
Large Language Models (LLMs) have shown impressive capabilities in complex tasks and interactive environments. This paper introduces a simulation framework utilizing the game Balderdash to evaluate both the creativity and logical reasoning of LLMs.
arXiv Detail & Related papers (2024-11-15T18:42:48Z)
LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation [72.02635550088546]
This work explores how large language models (LLMs) can enhance CLIP's capability, especially for processing longer and more complex image captions.<n>We introduce a caption-to-caption contrastive fine-tuning framework, significantly enhancing the discriminative quality of LLM outputs.<n>Our approach outperforms LoRA-based methods, achieving nearly fourfold faster training with superior performance.
arXiv Detail & Related papers (2024-11-07T18:59:16Z)
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents [19.989503513817095]
Large Language Models can be prompted to "self-play" conversational games that probe certain capabilities. We take one of the proposed frameworks for setting up such game-play environments, and test its usefulness as an evaluation instrument.
arXiv Detail & Related papers (2024-05-31T14:43:31Z)
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to better interpret the programming domain knowledge.<n>CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception [63.03288425612792]
We propose bfAnyRef, a general MLLM model that can generate pixel-wise object perceptions and natural language descriptions from multi-modality references. Our model achieves state-of-the-art results across multiple benchmarks, including diverse modality referring segmentation and region-level referring expression generation.
arXiv Detail & Related papers (2024-03-05T13:45:46Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
Extrapolating Multilingual Understanding Models as Multilingual Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model. We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv Detail & Related papers (2023-05-22T15:33:21Z)
Examining Scaling and Transfer of Language Model Architectures for Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.