SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
- URL: http://arxiv.org/abs/2403.01248v1
- Date: Sat, 2 Mar 2024 16:16:26 GMT
- Title: SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
- Authors: Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A.
Ross, Cordelia Schmid, Alireza Fathi
- Abstract summary: SceneCraft is a Large Language Model (LLM) Agent converting text descriptions into Blender-executable Python scripts.
SceneCraft renders complex scenes with up to a hundred 3D assets.
We tackle these challenges through a combination of advanced abstraction, strategic planning, and library learning.
- Score: 76.22337677728109
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces SceneCraft, a Large Language Model (LLM) Agent
converting text descriptions into Blender-executable Python scripts which
render complex scenes with up to a hundred 3D assets. This process requires
complex spatial planning and arrangement. We tackle these challenges through a
combination of advanced abstraction, strategic planning, and library learning.
SceneCraft first models a scene graph as a blueprint, detailing the spatial
relationships among assets in the scene. SceneCraft then writes Python scripts
based on this graph, translating relationships into numerical constraints for
asset layout. Next, SceneCraft leverages the perceptual strengths of
vision-language foundation models like GPT-V to analyze rendered images and
iteratively refine the scene. On top of this process, SceneCraft features a
library learning mechanism that compiles common script functions into a
reusable library, facilitating continuous self-improvement without expensive
LLM parameter tuning. Our evaluation demonstrates that SceneCraft surpasses
existing LLM-based agents in rendering complex scenes, as shown by its
adherence to constraints and favorable human assessments. We also showcase the
broader application potential of SceneCraft by reconstructing detailed 3D
scenes from the Sintel movie and guiding a video generative model with
generated scenes as intermediary control signal.
Related papers
- SceneCraft: Layout-Guided 3D Scene Generation [29.713491313796084]
SceneCraft is a novel method for generating detailed indoor scenes that adhere to textual descriptions and spatial layout preferences.
Our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality.
arXiv Detail & Related papers (2024-10-11T17:59:58Z) - MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling [21.1274747033854]
Character video synthesis aims to produce realistic videos of animatable characters within lifelike scenes.
Milo is a novel framework which can synthesize character videos with controllable attributes.
Milo achieves advanced scalability to arbitrary characters, generality to novel 3D motions, and applicability to interactive real-world scenes.
arXiv Detail & Related papers (2024-09-24T15:00:07Z) - 3D scene generation from scene graphs and self-attention [51.49886604454926]
We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans.
We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene.
arXiv Detail & Related papers (2024-04-02T12:26:17Z) - SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model [7.707324214953882]
We introduce SceneScript, a method that produces full scene models as a sequence of structured language commands.
Our method infers the set of structured language commands directly from encoded visual data.
Our method gives state-of-the art results in architectural layout estimation, and competitive results in 3D object detection.
arXiv Detail & Related papers (2024-03-19T18:01:29Z) - Disentangled 3D Scene Generation with Layout Learning [109.03233745767062]
We introduce a method to generate 3D scenes that are disentangled into their component objects.
Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene.
We show that despite its simplicity, our approach successfully generates 3D scenes into individual objects.
arXiv Detail & Related papers (2024-02-26T18:54:15Z) - Blocks2World: Controlling Realistic Scenes with Editable Primitives [5.541644538483947]
We present Blocks2World, a novel method for 3D scene rendering and editing.
Our technique begins by extracting 3D parallelepipeds from various objects in a given scene using convex decomposition.
The next stage involves training a conditioned model that learns to generate images from the 2D-rendered convex primitives.
arXiv Detail & Related papers (2023-07-07T21:38:50Z) - DORSal: Diffusion for Object-centric Representations of Scenes et al [28.181157214966493]
Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes.
We propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on frozen object-centric slot-based representations of scenes.
arXiv Detail & Related papers (2023-06-13T18:32:35Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - Set-the-Scene: Global-Local Training for Generating Controllable NeRF
Scenes [68.14127205949073]
We propose a novel GlobalLocal training framework for synthesizing a 3D scene using object proxies.
We show that using proxies allows a wide variety of editing options, such as adjusting the placement of each independent object.
Our results show that Set-the-Scene offers a powerful solution for scene synthesis and manipulation.
arXiv Detail & Related papers (2023-03-23T17:17:29Z) - Control-NeRF: Editable Feature Volumes for Scene Rendering and
Manipulation [58.16911861917018]
We present a novel method for performing flexible, 3D-aware image content manipulation while enabling high-quality novel view synthesis.
Our model couples learnt scene-specific feature volumes with a scene agnostic neural rendering network.
We demonstrate various scene manipulations, including mixing scenes, deforming objects and inserting objects into scenes, while still producing photo-realistic results.
arXiv Detail & Related papers (2022-04-22T17:57:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.