WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents
- URL: http://arxiv.org/abs/2502.15601v2
- Date: Fri, 28 Feb 2025 08:49:29 GMT
- Title: WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents
- Authors: Xinhang Liu, Chi-Keung Tang, Yu-Wing Tai,
- Abstract summary: We introduce WorldCraft, a system where large language model (LLM) agents leverage procedural generation to create scenes populated with objects.<n>In our framework, a coordinator agent manages the overall process and works with two specialized LLM agents to complete the scene creation.<n>Our pipeline incorporates a trajectory control agent, allowing users to animate the scene and operate the camera through natural language interactions.
- Score: 67.31920821192323
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Constructing photorealistic virtual worlds has applications across various fields, but it often requires the extensive labor of highly trained professionals to operate conventional 3D modeling software. To democratize this process, we introduce WorldCraft, a system where large language model (LLM) agents leverage procedural generation to create indoor and outdoor scenes populated with objects, allowing users to control individual object attributes and the scene layout using intuitive natural language commands. In our framework, a coordinator agent manages the overall process and works with two specialized LLM agents to complete the scene creation: ForgeIt, which integrates an ever-growing manual through auto-verification to enable precise customization of individual objects, and ArrangeIt, which formulates hierarchical optimization problems to achieve a layout that balances ergonomic and aesthetic considerations. Additionally, our pipeline incorporates a trajectory control agent, allowing users to animate the scene and operate the camera through natural language interactions. Our system is also compatible with off-the-shelf deep 3D generators to enrich scene assets. Through evaluations and comparisons with state-of-the-art methods, we demonstrate the versatility of WorldCraft, ranging from single-object customization to intricate, large-scale interior and exterior scene designs. This system empowers non-professionals to bring their creative visions to life.
Related papers
- HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation.
We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z) - Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians [65.09942210464747]
Building asset creation is labor-intensive and requires specialized skills to develop design rules.
Recent generative models for building creation often overlook these patterns, leading to low visual fidelity and limited scalability.
By manipulating procedural code, we can streamline this process and generate an infinite variety of buildings.
arXiv Detail & Related papers (2024-12-10T16:45:32Z) - Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting [47.014044892025346]
Architect is a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting.
Our pipeline is further extended to a hierarchical and iterative inpainting process to continuously generate placement of large furniture and small objects to enrich the scene.
arXiv Detail & Related papers (2024-11-14T22:15:48Z) - BlenderAlchemy: Editing 3D Graphics with Vision-Language Models [4.852796482609347]
A vision-based edit generator and state evaluator work together to find the correct sequence of actions to achieve the goal.
Inspired by the role of visual imagination in the human design process, we supplement the visual reasoning capabilities of Vision-Language Models with "imagined" reference images.
arXiv Detail & Related papers (2024-04-26T19:37:13Z) - SceneX: Procedural Controllable Large-scale Scene Generation [52.4743878200172]
We introduce SceneX, which can automatically produce high-quality procedural models according to designers' textual descriptions.<n>The proposed method comprises two components, PCGHub and PCGPlanner.<n>The latter aims to generate executable actions for Blender to produce controllable and precise 3D assets guided by the user's instructions.
arXiv Detail & Related papers (2024-03-23T03:23:29Z) - Style-Consistent 3D Indoor Scene Synthesis with Decoupled Objects [84.45345829270626]
Controllable 3D indoor scene synthesis stands at the forefront of technological progress.
Current methods for scene stylization are limited to applying styles to the entire scene.
We introduce a unique pipeline designed for synthesis 3D indoor scenes.
arXiv Detail & Related papers (2024-01-24T03:10:36Z) - Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models [12.965144877139393]
We introduce Dream2Real, a robotics framework which integrates vision-language models (VLMs) trained on 2D data into a 3D object rearrangement pipeline.
This is achieved by the robot autonomously constructing a 3D representation of the scene, where objects can be rearranged virtually and an image of the resulting arrangement rendered.
These renders are evaluated by a VLM, so that the arrangement which best satisfies the user instruction is selected and recreated in the real world with pick-and-place.
arXiv Detail & Related papers (2023-12-07T18:51:19Z) - 3D-GPT: Procedural 3D Modeling with Large Language Models [47.72968643115063]
We introduce 3D-GPT, a framework utilizing large language models(LLMs) for instruction-driven 3D modeling.
3D-GPT positions LLMs as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task.
Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers.
arXiv Detail & Related papers (2023-10-19T17:41:48Z) - UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative
Neural Feature Fields [22.180286908121946]
We propose UrbanGIRAFFE, which uses a coarse 3D panoptic prior to guide a 3D-aware generative model.
Our model is compositional and controllable as it breaks down the scene into stuff, objects, and sky.
With proper loss functions, our approach facilitates photorealistic 3D-aware image synthesis with diverse controllability.
arXiv Detail & Related papers (2023-03-24T17:28:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.