DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft
- URL: http://arxiv.org/abs/2404.15538v1
- Date: Tue, 23 Apr 2024 21:57:14 GMT
- Title: DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft
- Authors: Sam Earle, Filippos Kokkinos, Yuhe Nie, Julian Togelius, Roberta Raileanu,
- Abstract summary: We present a method for generating functional 3D artifacts from free-form text prompts in the open-world game Minecraft.
Our method, DreamCraft, trains quantized Neural Radiance Fields (NeRFs) to represent artifacts that, when viewed in-game, match given text descriptions.
We show how this can be leveraged to generate 3D structures that match a target distribution or obey certain adjacency rules over the block types.
- Score: 19.9639990460142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast, text-to-3D methods allow users to specify desired characteristics in natural language, offering a high amount of flexibility and expressivity. But unlike PCG, such approaches cannot guarantee functionality, which is crucial for certain applications like game design. In this paper, we present a method for generating functional 3D artifacts from free-form text prompts in the open-world game Minecraft. Our method, DreamCraft, trains quantized Neural Radiance Fields (NeRFs) to represent artifacts that, when viewed in-game, match given text descriptions. We find that DreamCraft produces more aligned in-game artifacts than a baseline that post-processes the output of an unconstrained NeRF. Thanks to the quantized representation of the environment, functional constraints can be integrated using specialized loss terms. We show how this can be leveraged to generate 3D structures that match a target distribution or obey certain adjacency rules over the block types. DreamCraft inherits a high degree of expressivity and controllability from the NeRF, while still being able to incorporate functional constraints through domain-specific objectives.
Related papers
- Word2Minecraft: Generating 3D Game Levels through Large Language Models [6.037493811943889]
We present Word2Minecraft, a system that generates playable game levels in Minecraft based on structured stories.
We introduce a flexible framework that allows for the customization of story complexity, enabling dynamic level generation.
We show that GPT-4-Turbo outperforms GPT-4o-Mini in most areas, including story coherence and objective enjoyment.
arXiv Detail & Related papers (2025-03-18T18:38:38Z) - SceneCraft: Layout-Guided 3D Scene Generation [29.713491313796084]
SceneCraft is a novel method for generating detailed indoor scenes that adhere to textual descriptions and spatial layout preferences.
Our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality.
arXiv Detail & Related papers (2024-10-11T17:59:58Z) - Minecraft-ify: Minecraft Style Image Generation with Text-guided Image
Editing for In-Game Application [5.431779602239565]
Ours can generate face-focused image for texture mapping tailored to 3D virtual character having cube manifold.
It can be manipulated with text-guidance using StyleGAN and StyleCLIP.
arXiv Detail & Related papers (2024-02-08T07:01:00Z) - GO-NeRF: Generating Virtual Objects in Neural Radiance Fields [75.13534508391852]
GO-NeRF is capable of utilizing scene context for high-quality and harmonious 3D object generation within an existing NeRF.
Our method employs a compositional rendering formulation that allows the generated 3D objects to be seamlessly composited into the scene.
arXiv Detail & Related papers (2024-01-11T08:58:13Z) - CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting [57.14748263512924]
CG3D is a method for compositionally generating scalable 3D assets.
Gamma radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes.
arXiv Detail & Related papers (2023-11-29T18:55:38Z) - IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts [90.49024750432139]
We present IPDreamer, a novel method that captures intricate appearance features from complex $textbfI$mage $textbfP$rompts and aligns the synthesized 3D object with these extracted features.
Our experiments demonstrate that IPDreamer consistently generates high-quality 3D objects that align with both the textual and complex image prompts.
arXiv Detail & Related papers (2023-10-09T03:11:08Z) - TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields [98.62319447738332]
We introduce a conditional 3D generative model, namely TextField3D.
Rather than using the text prompts as input directly, we suggest to inject dynamic noise into the latent space of given text prompts.
To guide the conditional generation in both geometry and texture, multi-modal discrimination is constructed with a text-3D discriminator and a text-2.5D discriminator.
arXiv Detail & Related papers (2023-09-29T12:14:41Z) - TADA! Text to Animatable Digital Avatars [57.52707683788961]
TADA takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures.
We derive an optimizable high-resolution body model from SMPL-X with 3D displacements and a texture map.
We render normals and RGB images of the generated character and exploit their latent embeddings in the SDS training process.
arXiv Detail & Related papers (2023-08-21T17:59:10Z) - Fantasia3D: Disentangling Geometry and Appearance for High-quality
Text-to-3D Content Creation [45.69270771487455]
We propose a new method of Fantasia3D for high-quality text-to-3D content creation.
Key to Fantasia3D is the disentangled modeling and learning of geometry and appearance.
Our framework is more compatible with popular graphics engines, supporting relighting, editing, and physical simulation of the generated 3D assets.
arXiv Detail & Related papers (2023-03-24T09:30:09Z) - TEGLO: High Fidelity Canonical Texture Mapping from Single-View Images [1.4502611532302039]
We propose TEGLO (Textured EG3D-GLO) for learning 3D representations from single view in-the-wild image collections.
We accomplish this by training a conditional Neural Radiance Field (NeRF) without any explicit 3D supervision.
We demonstrate that such mapping enables texture transfer and texture editing without requiring meshes with shared topology.
arXiv Detail & Related papers (2023-03-24T01:52:03Z) - Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures [72.44361273600207]
We adapt the score distillation to the publicly available, and computationally efficient, Latent Diffusion Models.
Latent Diffusion Models apply the entire diffusion process in a compact latent space of a pretrained autoencoder.
We show that latent score distillation can be successfully applied directly on 3D meshes.
arXiv Detail & Related papers (2022-11-14T18:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.