Related papers: Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments

Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments

URL: http://arxiv.org/abs/2509.04481v1
Date: Sun, 31 Aug 2025 01:45:56 GMT
Title: Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments
Authors: Yi-Chun Chen, Arnav Jhala,
Abstract summary: We present a lightweight pipeline that transforms short narrative prompts into a sequence of 2D tile-based game scenes.<n>Given an LLM-generated narrative, our system identifies three key time frames, extracts spatial predicates, and retrieves visual assets.<n>A layered terrain is generated using Cellular Automata, and objects are placed using spatial rules grounded in the predicate structure.
Score: 0.09821874476902966
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent advances in large language models(LLMs) enable compelling story generation, but connecting narrative text to playable visual environments remains an open challenge in procedural content generation(PCG). We present a lightweight pipeline that transforms short narrative prompts into a sequence of 2D tile-based game scenes, reflecting the temporal structure of stories. Given an LLM-generated narrative, our system identifies three key time frames, extracts spatial predicates in the form of "Object-Relation-Object" triples, and retrieves visual assets using affordance-aware semantic embeddings from the GameTileNet dataset. A layered terrain is generated using Cellular Automata, and objects are placed using spatial rules grounded in the predicate structure. We evaluated our system in ten diverse stories, analyzing tile-object matching, affordance-layer alignment, and spatial constraint satisfaction across frames. This prototype offers a scalable approach to narrative-driven scene generation and lays the foundation for future work on multi-frame continuity, symbolic tracking, and multi-agent coordination in story-centered PCG.

Related papers

StoryTailor:A Zero-Shot Pipeline for Action-Rich Multi-Subject Visual Narratives [7.243114047801061]
We propose a zero-shot pipeline that produces temporally coherent, identity-preserving image sequences.<n>Story delivers expressive interactions and evolving yet stable scenes.
arXiv Detail & Related papers (2026-02-24T16:07:02Z)
PSGS: Text-driven Panorama Sliding Scene Generation via Gaussian Splatting [18.048020748522312]
We propose PSGS, a framework for high-fidelity panoramic scene generation.<n>First, a novel two-layer optimization architecture generates semantically coherent panoramas.<n>Second, our panorama sliding mechanism initializes globally consistent 3D Gaussian Splatting point clouds.
arXiv Detail & Related papers (2026-01-31T02:34:46Z)
Agentic 3D Scene Generation with Spatially Contextualized VLMs [67.31920821192323]
We introduce a new paradigm that enables vision-language models to generate, understand, and edit complex 3D environments.<n>We develop an agentic 3D scene generation pipeline in which the VLM iteratively reads from and updates the spatial context.<n>Results show that our framework can handle diverse and challenging inputs, achieving a level of generalization not observed in prior work.
arXiv Detail & Related papers (2025-05-26T15:28:17Z)
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation [1.0312968200748118]
Visual storytelling systems struggle to maintain character identity across frames and link actions to appropriate subjects.<n>We propose StoryReasoning, a dataset containing 4,178 stories derived from 52,016 movie images.<n>We show a reduction from 4.06 to 3.56 (-12.3%) hallucinations on average per story and an improvement in creativity from 2.58 to 3.38 (+31.0%) when compared to a non-fine-tuned model.
arXiv Detail & Related papers (2025-05-15T13:42:14Z)
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation.<n>We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z)
DreamRunner: Fine-Grained Compositional Story-to-Video Generation with Retrieval-Augmented Motion Adaptation [60.07447565026327]
We propose DreamRunner, a novel story-to-video generation method.<n>We structure the input script using a large language model (LLM) to facilitate both coarse-grained scene planning and fine-grained object-level layout and motion planning.<n>DreamRunner presents retrieval-augmented test-time adaptation to capture target motion priors for objects in each scene, supporting diverse motion customization based on retrieved videos.
arXiv Detail & Related papers (2024-11-25T18:41:56Z)
Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models [57.30913211264333]
We present Story3D-Agent, a pioneering approach that transforms provided narratives into 3D-rendered visualizations. By integrating procedural modeling, our approach enables precise control over multi-character actions and motions, as well as diverse decorative elements. We have thoroughly evaluated our Story3D-Agent to validate its effectiveness, offering a basic framework to advance 3D story representation.
arXiv Detail & Related papers (2024-08-21T17:43:15Z)
See It All: Contextualized Late Aggregation for 3D Dense Captioning [38.14179122810755]
3D dense captioning is a task to localize objects in a 3D scene and generate descriptive sentences for each object. Recent approaches in 3D dense captioning have adopted transformer encoder-decoder frameworks from object detection to build an end-to-end pipeline without hand-crafted components. We introduce SIA (See-It-All), a transformer pipeline that engages in 3D dense captioning with a novel paradigm called late aggregation.
arXiv Detail & Related papers (2024-08-14T16:19:18Z)
Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases [13.126239167800652]
We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions. The objects in generated scenes are not restricted to a fixed set of object categories.
arXiv Detail & Related papers (2024-02-05T01:59:31Z)
TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes [67.5351491691866]
We present a novel framework, dubbed TeMO, to parse multi-object 3D scenes and edit their styles. Our method can synthesize high-quality stylized content and outperform the existing methods over a wide range of multi-object 3D meshes.
arXiv Detail & Related papers (2023-12-07T12:10:05Z)
Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation. The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects. Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.