Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments
- URL: http://arxiv.org/abs/2509.04481v1
- Date: Sun, 31 Aug 2025 01:45:56 GMT
- Title: Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments
- Authors: Yi-Chun Chen, Arnav Jhala,
- Abstract summary: We present a lightweight pipeline that transforms short narrative prompts into a sequence of 2D tile-based game scenes.<n>Given an LLM-generated narrative, our system identifies three key time frames, extracts spatial predicates, and retrieves visual assets.<n>A layered terrain is generated using Cellular Automata, and objects are placed using spatial rules grounded in the predicate structure.
- Score: 0.09821874476902966
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advances in large language models(LLMs) enable compelling story generation, but connecting narrative text to playable visual environments remains an open challenge in procedural content generation(PCG). We present a lightweight pipeline that transforms short narrative prompts into a sequence of 2D tile-based game scenes, reflecting the temporal structure of stories. Given an LLM-generated narrative, our system identifies three key time frames, extracts spatial predicates in the form of "Object-Relation-Object" triples, and retrieves visual assets using affordance-aware semantic embeddings from the GameTileNet dataset. A layered terrain is generated using Cellular Automata, and objects are placed using spatial rules grounded in the predicate structure. We evaluated our system in ten diverse stories, analyzing tile-object matching, affordance-layer alignment, and spatial constraint satisfaction across frames. This prototype offers a scalable approach to narrative-driven scene generation and lays the foundation for future work on multi-frame continuity, symbolic tracking, and multi-agent coordination in story-centered PCG.
Related papers
- StoryTailor:A Zero-Shot Pipeline for Action-Rich Multi-Subject Visual Narratives [7.243114047801061]
We propose a zero-shot pipeline that produces temporally coherent, identity-preserving image sequences.<n>Story delivers expressive interactions and evolving yet stable scenes.
arXiv Detail & Related papers (2026-02-24T16:07:02Z) - PSGS: Text-driven Panorama Sliding Scene Generation via Gaussian Splatting [18.048020748522312]
We propose PSGS, a framework for high-fidelity panoramic scene generation.<n>First, a novel two-layer optimization architecture generates semantically coherent panoramas.<n>Second, our panorama sliding mechanism initializes globally consistent 3D Gaussian Splatting point clouds.
arXiv Detail & Related papers (2026-01-31T02:34:46Z) - Agentic 3D Scene Generation with Spatially Contextualized VLMs [67.31920821192323]
We introduce a new paradigm that enables vision-language models to generate, understand, and edit complex 3D environments.<n>We develop an agentic 3D scene generation pipeline in which the VLM iteratively reads from and updates the spatial context.<n>Results show that our framework can handle diverse and challenging inputs, achieving a level of generalization not observed in prior work.
arXiv Detail & Related papers (2025-05-26T15:28:17Z) - StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation [1.0312968200748118]
Visual storytelling systems struggle to maintain character identity across frames and link actions to appropriate subjects.<n>We propose StoryReasoning, a dataset containing 4,178 stories derived from 52,016 movie images.<n>We show a reduction from 4.06 to 3.56 (-12.3%) hallucinations on average per story and an improvement in creativity from 2.58 to 3.38 (+31.0%) when compared to a non-fine-tuned model.
arXiv Detail & Related papers (2025-05-15T13:42:14Z) - HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation.<n>We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z) - DreamRunner: Fine-Grained Compositional Story-to-Video Generation with Retrieval-Augmented Motion Adaptation [60.07447565026327]
We propose DreamRunner, a novel story-to-video generation method.<n>We structure the input script using a large language model (LLM) to facilitate both coarse-grained scene planning and fine-grained object-level layout and motion planning.<n>DreamRunner presents retrieval-augmented test-time adaptation to capture target motion priors for objects in each scene, supporting diverse motion customization based on retrieved videos.
arXiv Detail & Related papers (2024-11-25T18:41:56Z) - Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models [57.30913211264333]
We present Story3D-Agent, a pioneering approach that transforms provided narratives into 3D-rendered visualizations.
By integrating procedural modeling, our approach enables precise control over multi-character actions and motions, as well as diverse decorative elements.
We have thoroughly evaluated our Story3D-Agent to validate its effectiveness, offering a basic framework to advance 3D story representation.
arXiv Detail & Related papers (2024-08-21T17:43:15Z) - See It All: Contextualized Late Aggregation for 3D Dense Captioning [38.14179122810755]
3D dense captioning is a task to localize objects in a 3D scene and generate descriptive sentences for each object.
Recent approaches in 3D dense captioning have adopted transformer encoder-decoder frameworks from object detection to build an end-to-end pipeline without hand-crafted components.
We introduce SIA (See-It-All), a transformer pipeline that engages in 3D dense captioning with a novel paradigm called late aggregation.
arXiv Detail & Related papers (2024-08-14T16:19:18Z) - Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases [13.126239167800652]
We present a system for generating indoor scenes in response to text prompts.
The prompts are not limited to a fixed vocabulary of scene descriptions.
The objects in generated scenes are not restricted to a fixed set of object categories.
arXiv Detail & Related papers (2024-02-05T01:59:31Z) - TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes [67.5351491691866]
We present a novel framework, dubbed TeMO, to parse multi-object 3D scenes and edit their styles.
Our method can synthesize high-quality stylized content and outperform the existing methods over a wide range of multi-object 3D meshes.
arXiv Detail & Related papers (2023-12-07T12:10:05Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.