Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
- URL: http://arxiv.org/abs/2411.18644v1
- Date: Tue, 26 Nov 2024 19:21:57 GMT
- Title: Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
- Authors: Zhaofang Qian, Abolfazl Sharifi, Tucker Carroll, Ser-Nam Lim,
- Abstract summary: Scene Copilot is a framework combining large language models (LLMs) with a procedural 3D scene generator.
Scene Codex is designed to translate textual user input into commands understandable by the 3D scene generator.
BlenderGPT provides users with an intuitive and direct way to precisely control the generated 3D scene and the final output video.
- Score: 32.92038804110175
- License:
- Abstract: Video generation has achieved impressive quality, but it still suffers from artifacts such as temporal inconsistency and violation of physical laws. Leveraging 3D scenes can fundamentally resolve these issues by providing precise control over scene entities. To facilitate the easy generation of diverse photorealistic scenes, we propose Scene Copilot, a framework combining large language models (LLMs) with a procedural 3D scene generator. Specifically, Scene Copilot consists of Scene Codex, BlenderGPT, and Human in the loop. Scene Codex is designed to translate textual user input into commands understandable by the 3D scene generator. BlenderGPT provides users with an intuitive and direct way to precisely control the generated 3D scene and the final output video. Furthermore, users can utilize Blender UI to receive instant visual feedback. Additionally, we have curated a procedural dataset of objects in code format to further enhance our system's capabilities. Each component works seamlessly together to support users in generating desired 3D scenes. Extensive experiments demonstrate the capability of our framework in customizing 3D scenes and video generation.
Related papers
- PaintScene4D: Consistent 4D Scene Generation from Text Prompts [29.075849524496707]
PaintScene4D is a novel text-to-4D scene generation framework.
It harnesses video generative models trained on diverse real-world datasets.
It produces realistic 4D scenes that can be viewed from arbitrary trajectories.
arXiv Detail & Related papers (2024-12-05T18:59:57Z) - Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches [50.51643519253066]
3D Content Generation is at the heart of many computer graphics applications, including video gaming, film-making, virtual and augmented reality, etc.
This paper proposes a novel deep-learning based approach for automatically generating interactive and playable 3D game scenes.
arXiv Detail & Related papers (2024-08-08T16:27:37Z) - iControl3D: An Interactive System for Controllable 3D Scene Generation [57.048647153684485]
iControl3D is a novel interactive system that empowers users to generate and render customizable 3D scenes with precise control.
We leverage 3D meshes as an intermediary proxy to iteratively merge individual 2D diffusion-generated images into a cohesive and unified 3D scene representation.
Our neural rendering interface enables users to build a radiance field of their scene online and navigate the entire scene.
arXiv Detail & Related papers (2024-08-03T06:35:09Z) - SceneTeller: Language-to-3D Scene Generation [15.209079637302905]
Given a prompt in natural language describing the object placement in the room, our method produces a high-quality 3D scene corresponding to it.
Our turnkey pipeline produces state-of-the-art 3D scenes, while being easy to use even for novices.
arXiv Detail & Related papers (2024-07-30T10:45:28Z) - 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting [100.94916668527544]
Existing methods solely focus on either 2D individual object or 3D global scene editing.
We propose 3DitScene, a novel and unified scene editing framework.
It enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects.
arXiv Detail & Related papers (2024-05-28T17:59:01Z) - Disentangled 3D Scene Generation with Layout Learning [109.03233745767062]
We introduce a method to generate 3D scenes that are disentangled into their component objects.
Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene.
We show that despite its simplicity, our approach successfully generates 3D scenes into individual objects.
arXiv Detail & Related papers (2024-02-26T18:54:15Z) - GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting [52.150502668874495]
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation.
GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing.
arXiv Detail & Related papers (2024-02-11T13:40:08Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.