3D-GPT: Procedural 3D Modeling with Large Language Models
- URL: http://arxiv.org/abs/2310.12945v2
- Date: Wed, 29 May 2024 01:35:15 GMT
- Title: 3D-GPT: Procedural 3D Modeling with Large Language Models
- Authors: Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, Stephen Gould,
- Abstract summary: We introduce 3D-GPT, a framework utilizing large language models(LLMs) for instruction-driven 3D modeling.
3D-GPT positions LLMs as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task.
Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers.
- Score: 47.72968643115063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the pursuit of efficient automated content creation, procedural generation, leveraging modifiable parameters and rule-based systems, emerges as a promising approach. Nonetheless, it could be a demanding endeavor, given its intricate nature necessitating a deep understanding of rules, algorithms, and parameters. To reduce workload, we introduce 3D-GPT, a framework utilizing large language models~(LLMs) for instruction-driven 3D modeling. 3D-GPT positions LLMs as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task. 3D-GPT integrates three core agents: the task dispatch agent, the conceptualization agent, and the modeling agent. They collaboratively achieve two objectives. First, it enhances concise initial scene descriptions, evolving them into detailed forms while dynamically adapting the text based on subsequent instructions. Second, it integrates procedural generation, extracting parameter values from enriched text to effortlessly interface with 3D software for asset creation. Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers. Furthermore, it seamlessly integrates with Blender, unlocking expanded manipulation possibilities. Our work highlights the potential of LLMs in 3D modeling, offering a basic framework for future advancements in scene generation and animation.
Related papers
- LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models [62.85566496673856]
This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model.
A primary challenge is effectively tokenizing 3D mesh data into discrete tokens that LLMs can process seamlessly.
Our work is the first to demonstrate that LLMs can be fine-tuned to acquire complex spatial knowledge for 3D mesh generation in a text-based format.
arXiv Detail & Related papers (2024-11-14T17:08:23Z) - Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models [57.30913211264333]
We present Story3D-Agent, a pioneering approach that transforms provided narratives into 3D-rendered visualizations.
By integrating procedural modeling, our approach enables precise control over multi-character actions and motions, as well as diverse decorative elements.
We have thoroughly evaluated our Story3D-Agent to validate its effectiveness, offering a basic framework to advance 3D story representation.
arXiv Detail & Related papers (2024-08-21T17:43:15Z) - Interactive3D: Create What You Want by Interactive 3D Generation [13.003964182554572]
We introduce Interactive3D, an innovative framework for interactive 3D generation that grants users precise control over the generative process.
Our experiments demonstrate that Interactive3D markedly improves the controllability and quality of 3D generation.
arXiv Detail & Related papers (2024-04-25T11:06:57Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning [24.162598399141785]
Scene-LLM is a 3D-visual-language model that enhances embodied agents' abilities in interactive 3D indoor environments.
Our experiments with Scene-LLM demonstrate its strong capabilities in dense captioning, question answering, and interactive planning.
arXiv Detail & Related papers (2024-03-18T01:18:48Z) - 3D-PreMise: Can Large Language Models Generate 3D Shapes with Sharp
Features and Parametric Control? [8.893200442359518]
We introduce a framework that employs Large Language Models to generate text-driven 3D shapes.
We present 3D-PreMise, a dataset specifically tailored for 3D parametric modeling of industrial shapes.
arXiv Detail & Related papers (2024-01-12T08:07:52Z) - CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting [57.14748263512924]
CG3D is a method for compositionally generating scalable 3D assets.
Gamma radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes.
arXiv Detail & Related papers (2023-11-29T18:55:38Z) - GET3D: A Generative Model of High Quality 3D Textured Shapes Learned
from Images [72.15855070133425]
We introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high-fidelity textures.
GET3D is able to generate high-quality 3D textured meshes, ranging from cars, chairs, animals, motorbikes and human characters to buildings.
arXiv Detail & Related papers (2022-09-22T17:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.