3D Building Generation in Minecraft via Large Language Models
- URL: http://arxiv.org/abs/2406.08751v1
- Date: Thu, 13 Jun 2024 02:21:07 GMT
- Title: 3D Building Generation in Minecraft via Large Language Models
- Authors: Shiying Hu, Zengrong Huang, Chengpeng Hu, Jialin Liu,
- Abstract summary: This paper explores how large language models (LLMs) contribute to the generation of 3D buildings in a sandbox game, Minecraft.
We propose a Text to Building in Minecraft (T2BM) model, which involves refining prompts, decoding interlayer representation and repairing.
- Score: 1.9670700129679104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, procedural content generation has exhibited considerable advancements in the domain of 2D game level generation such as Super Mario Bros. and Sokoban through large language models (LLMs). To further validate the capabilities of LLMs, this paper explores how LLMs contribute to the generation of 3D buildings in a sandbox game, Minecraft. We propose a Text to Building in Minecraft (T2BM) model, which involves refining prompts, decoding interlayer representation and repairing. Facade, indoor scene and functional blocks like doors are supported in the generation. Experiments are conducted to evaluate the completeness and satisfaction of buildings generated via LLMs. It shows that LLMs hold significant potential for 3D building generation. Given appropriate prompts, LLMs can generate correct buildings in Minecraft with complete structures and incorporate specific building blocks such as windows and beds, meeting the specified requirements of human users.
Related papers
- SynCity: Training-Free Generation of 3D Worlds [107.69875149880679]
We propose SynCity, a training- and optimization-free approach to generating 3D worlds from textual descriptions.
We show how 3D and 2D generators can be combined to generate ever-expanding scenes.
arXiv Detail & Related papers (2025-03-20T17:59:40Z) - Cube: A Roblox View of 3D Intelligence [67.43543266278154]
Foundation models trained on vast amounts of data have demonstrated remarkable reasoning and generation capabilities.
We show how our tokenization scheme can be used in applications for text-to-shape generation, shape-to-text generation and text-to-scene generation.
We conclude with a discussion outlining our path to building a fully unified foundation model for 3D intelligence.
arXiv Detail & Related papers (2025-03-19T17:52:17Z) - Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians [65.09942210464747]
Building asset creation is labor-intensive and requires specialized skills to develop design rules.
Recent generative models for building creation often overlook these patterns, leading to low visual fidelity and limited scalability.
By manipulating procedural code, we can streamline this process and generate an infinite variety of buildings.
arXiv Detail & Related papers (2024-12-10T16:45:32Z) - LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models [62.85566496673856]
This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model.
A primary challenge is effectively tokenizing 3D mesh data into discrete tokens that LLMs can process seamlessly.
Our work is the first to demonstrate that LLMs can be fine-tuned to acquire complex spatial knowledge for 3D mesh generation in a text-based format.
arXiv Detail & Related papers (2024-11-14T17:08:23Z) - Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft [18.256529559741075]
In the Minecraft Collaborative Building Task, two players collaborate: an Architect (A) provides instructions to a Builder (B) to assemble a specified structure using 3D blocks.
We investigate the use of large language models (LLMs) to predict the sequence of actions taken by the Builder.
arXiv Detail & Related papers (2024-06-25T13:43:24Z) - VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification [56.211321810408194]
Large language models (LLMs) have shown great potential in multi-modal understanding and generation tasks.
We present Volume Patch LLM (VP-LLM), which leverages LLMs to perform conditional 3D completion in a single-forward pass.
Our results demonstrate a strong ability of LLMs to interpret complex text instructions and understand 3D objects, surpassing state-of-the-art diffusion-based 3D completion models in generation quality.
arXiv Detail & Related papers (2024-06-08T18:17:09Z) - When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models [113.18524940863841]
This survey provides a comprehensive overview of the methodologies enabling large language models to process, understand, and generate 3D data.
Our investigation spans various 3D data representations, from point clouds to Neural Radiance Fields (NeRFs)
It examines their integration with LLMs for tasks such as 3D scene understanding, captioning, question-answering, and dialogue.
arXiv Detail & Related papers (2024-05-16T16:59:58Z) - Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation [96.0573187419543]
Chain-of-Thought (CoT) guides large language models to reason step-by-step, and can motivate their logical reasoning ability.
We explore the Leap-of-Thought (LoT) abilities within large language models (LLMs)
LoT is a non-sequential, creative paradigm involving strong associations and knowledge leaps.
arXiv Detail & Related papers (2023-12-05T02:41:57Z) - Ghost in the Minecraft: Generally Capable Agents for Open-World
Environments via Large Language Models with Text-based Knowledge and Memory [97.87093169454431]
Ghost in the Minecraft (GITM) is a novel framework that integrates Large Language Models (LLMs) with text-based knowledge and memory.
We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute.
The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate.
arXiv Detail & Related papers (2023-05-25T17:59:49Z) - Towards Language-guided Interactive 3D Generation: LLMs as Layout
Interpreter with Generative Feedback [20.151147653552155]
Large Language Models (LLMs) have demonstrated impressive reasoning, conversational, and zero-shot generation abilities.
We propose a novel language-guided interactive 3D generation system, dubbed LI3D, that integrates LLMs as a 3D layout interpreter.
Our system also incorporates LLaVA, a large language and vision assistant, to provide generative feedback from the visual aspect for improving the visual quality of generated content.
arXiv Detail & Related papers (2023-05-25T07:43:39Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Level Generation Through Large Language Models [3.620115940532283]
Large Language Models (LLMs) are powerful tools capable of leveraging their training on natural language to write stories, generate code, and answer questions.
But can they generate functional video game levels?
Game levels, with their complex functional constraints and spatial relationships in more than one dimension, are very different from the kinds of data an LLM typically sees during training.
We investigate the use of LLMs to generate levels for the game Sokoban, finding that LLMs are indeed capable of doing so, and that their performance scales dramatically with dataset size.
arXiv Detail & Related papers (2023-02-11T23:34:42Z) - World-GAN: a Generative Model for Minecraft Worlds [27.221938979891384]
This work introduces World-GAN, the first method to perform data-driven Procedural Content Generation via Machine Learning in Minecraft.
Based on a 3D Generative Adversarial Network (GAN) architecture, we are able to create arbitrarily sized world snippets from a given sample.
arXiv Detail & Related papers (2021-06-18T14:45:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.