Word2Minecraft: Generating 3D Game Levels through Large Language Models
- URL: http://arxiv.org/abs/2503.16536v1
- Date: Tue, 18 Mar 2025 18:38:38 GMT
- Title: Word2Minecraft: Generating 3D Game Levels through Large Language Models
- Authors: Shuo Huang, Muhammad Umair Nasir, Steven James, Julian Togelius,
- Abstract summary: We present Word2Minecraft, a system that generates playable game levels in Minecraft based on structured stories.<n>We introduce a flexible framework that allows for the customization of story complexity, enabling dynamic level generation.<n>We show that GPT-4-Turbo outperforms GPT-4o-Mini in most areas, including story coherence and objective enjoyment.
- Score: 6.037493811943889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Word2Minecraft, a system that leverages large language models to generate playable game levels in Minecraft based on structured stories. The system transforms narrative elements-such as protagonist goals, antagonist challenges, and environmental settings-into game levels with both spatial and gameplay constraints. We introduce a flexible framework that allows for the customization of story complexity, enabling dynamic level generation. The system employs a scaling algorithm to maintain spatial consistency while adapting key game elements. We evaluate Word2Minecraft using both metric-based and human-based methods. Our results show that GPT-4-Turbo outperforms GPT-4o-Mini in most areas, including story coherence and objective enjoyment, while the latter excels in aesthetic appeal. We also demonstrate the system' s ability to generate levels with high map enjoyment, offering a promising step forward in the intersection of story generation and game design. We open-source the code at https://github.com/JMZ-kk/Word2Minecraft/tree/word2mc_v0
Related papers
- Model as a Game: On Numerical and Spatial Consistency for Generative Games [117.36098212829766]
We revisit the paradigm of generative games to explore what truly constitutes a Model as a Game (MaaG) with a well-developed mechanism.
Based on the DiT architecture, we design two specialized modules: (1) a numerical module that integrates a LogicNet to determine event triggers, with calculations processed externally as conditions for image generation; and (2) a spatial module that maintains a map of explored areas, retrieving location-specific information during generation and linking new observations to ensure continuity.
arXiv Detail & Related papers (2025-03-27T05:46:15Z) - SynCity: Training-Free Generation of 3D Worlds [107.69875149880679]
We propose SynCity, a training- and optimization-free approach to generating 3D worlds from textual descriptions.<n>We show how 3D and 2D generators can be combined to generate ever-expanding scenes.
arXiv Detail & Related papers (2025-03-20T17:59:40Z) - DreamRunner: Fine-Grained Compositional Story-to-Video Generation with Retrieval-Augmented Motion Adaptation [60.07447565026327]
We propose DreamRunner, a novel story-to-video generation method.<n>We structure the input script using a large language model (LLM) to facilitate both coarse-grained scene planning and fine-grained object-level layout and motion planning.<n>DreamRunner presents retrieval-augmented test-time adaptation to capture target motion priors for objects in each scene, supporting diverse motion customization based on retrieved videos.
arXiv Detail & Related papers (2024-11-25T18:41:56Z) - DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft [19.9639990460142]
We present a method for generating functional 3D artifacts from free-form text prompts in the open-world game Minecraft.
Our method, DreamCraft, trains quantized Neural Radiance Fields (NeRFs) to represent artifacts that, when viewed in-game, match given text descriptions.
We show how this can be leveraged to generate 3D structures that match a target distribution or obey certain adjacency rules over the block types.
arXiv Detail & Related papers (2024-04-23T21:57:14Z) - Minecraft-ify: Minecraft Style Image Generation with Text-guided Image
Editing for In-Game Application [5.431779602239565]
Ours can generate face-focused image for texture mapping tailored to 3D virtual character having cube manifold.
It can be manipulated with text-guidance using StyleGAN and StyleCLIP.
arXiv Detail & Related papers (2024-02-08T07:01:00Z) - GPT4Point: A Unified Framework for Point-Language Understanding and
Generation [76.61439685940272]
GPT4Point is a groundbreaking point-language multimodal model for unified 3D object understanding and generation within the MLLM framework.
GPT4Point as a powerful 3D MLLM seamlessly can execute a variety of point-text reference tasks such as point-cloud captioning and Q&A.
It can get high-quality results through a low-quality point-text feature maintaining the geometric shapes and colors.
arXiv Detail & Related papers (2023-12-05T18:59:55Z) - Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion
Models [68.85478477006178]
We present a Promptable Game Model (PGM) for neural video game simulators.
It allows a user to play the game by prompting it with high- and low-level action sequences.
Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt.
Our method significantly outperforms existing neural video game simulators in terms of rendering quality and unlocks applications beyond the capabilities of the current state of the art.
arXiv Detail & Related papers (2023-03-23T17:43:17Z) - SceneHGN: Hierarchical Graph Networks for 3D Indoor Scene Generation
with Fine-Grained Geometry [92.24144643757963]
3D indoor scenes are widely used in computer graphics, with applications ranging from interior design to gaming to virtual and augmented reality.
High-quality 3D indoor scenes are highly demanded while it requires expertise and is time-consuming to design high-quality 3D indoor scenes manually.
We propose SCENEHGN, a hierarchical graph network for 3D indoor scenes that takes into account the full hierarchy from the room level to the object level, then finally to the object part level.
For the first time, our method is able to directly generate plausible 3D room content, including furniture objects with fine-grained geometry, and
arXiv Detail & Related papers (2023-02-16T15:31:59Z) - MarioGPT: Open-Ended Text2Level Generation through Large Language Models [20.264940262622282]
Procedural Content Generation (PCG) is a technique to generate complex and diverse environments in an automated way.
Here, we introduce MarioGPT, a fine-tuned GPT2 model trained to generate tile-based game levels.
arXiv Detail & Related papers (2023-02-12T19:12:24Z) - Infusing Commonsense World Models with Graph Knowledge [89.27044249858332]
We study the setting of generating narratives in an open world text adventure game.
A graph representation of the underlying game state can be used to train models that consume and output both grounded graph representations and natural language descriptions and actions.
arXiv Detail & Related papers (2023-01-13T19:58:27Z) - Automated Isovist Computation for Minecraft [0.0]
We develop a new set of automated metrics, motivated by ideas from architecture, namely isovists and space syntax.
These metrics can be computed for a specific game state, from the player's perspective, and take into account their embodiment in the game world.
We show how to apply those metrics to the 3d blockworld of Minecraft.
arXiv Detail & Related papers (2022-04-07T21:41:06Z) - World-GAN: a Generative Model for Minecraft Worlds [27.221938979891384]
This work introduces World-GAN, the first method to perform data-driven Procedural Content Generation via Machine Learning in Minecraft.
Based on a 3D Generative Adversarial Network (GAN) architecture, we are able to create arbitrarily sized world snippets from a given sample.
arXiv Detail & Related papers (2021-06-18T14:45:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.