SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model
- URL: http://arxiv.org/abs/2403.13064v1
- Date: Tue, 19 Mar 2024 18:01:29 GMT
- Title: SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model
- Authors: Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, Jakob Engel, Edward Miller, Richard Newcombe, Vasileios Balntas,
- Abstract summary: We introduce SceneScript, a method that produces full scene models as a sequence of structured language commands.
Our method infers the set of structured language commands directly from encoded visual data.
Our method gives state-of-the art results in architectural layout estimation, and competitive results in 3D object detection.
- Score: 7.707324214953882
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our proposed scene representation is inspired by recent successes in transformers & LLMs, and departs from more traditional methods which commonly describe scenes as meshes, voxel grids, point clouds or radiance fields. Our method infers the set of structured language commands directly from encoded visual data using a scene language encoder-decoder architecture. To train SceneScript, we generate and release a large-scale synthetic dataset called Aria Synthetic Environments consisting of 100k high-quality in-door scenes, with photorealistic and ground-truth annotated renders of egocentric scene walkthroughs. Our method gives state-of-the art results in architectural layout estimation, and competitive results in 3D object detection. Lastly, we explore an advantage for SceneScript, which is the ability to readily adapt to new commands via simple additions to the structured language, which we illustrate for tasks such as coarse 3D object part reconstruction.
Related papers
- SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code [76.22337677728109]
SceneCraft is a Large Language Model (LLM) Agent converting text descriptions into Blender-executable Python scripts.
SceneCraft renders complex scenes with up to a hundred 3D assets.
We tackle these challenges through a combination of advanced abstraction, strategic planning, and library learning.
arXiv Detail & Related papers (2024-03-02T16:16:26Z) - InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with
Semantic Graph Prior [27.773451301040424]
InstructScene is a novel generative framework that integrates a semantic graph prior and a layout decoder.
We show that the proposed method surpasses existing state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2024-02-07T10:09:00Z) - Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases [13.126239167800652]
We present a system for generating indoor scenes in response to text prompts.
The prompts are not limited to a fixed vocabulary of scene descriptions.
The objects in generated scenes are not restricted to a fixed set of object categories.
arXiv Detail & Related papers (2024-02-05T01:59:31Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - Set-the-Scene: Global-Local Training for Generating Controllable NeRF
Scenes [68.14127205949073]
We propose a novel GlobalLocal training framework for synthesizing a 3D scene using object proxies.
We show that using proxies allows a wide variety of editing options, such as adjusting the placement of each independent object.
Our results show that Set-the-Scene offers a powerful solution for scene synthesis and manipulation.
arXiv Detail & Related papers (2023-03-23T17:17:29Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - Semantic Scene Completion via Integrating Instances and Scene
in-the-Loop [73.11401855935726]
Semantic Scene Completion aims at reconstructing a complete 3D scene with precise voxel-wise semantics from a single-view depth or RGBD image.
We present Scene-Instance-Scene Network (textitSISNet), which takes advantages of both instance and scene level semantic information.
Our method is capable of inferring fine-grained shape details as well as nearby objects whose semantic categories are easily mixed-up.
arXiv Detail & Related papers (2021-04-08T09:50:30Z) - Static and Animated 3D Scene Generation from Free-form Text Descriptions [1.102914654802229]
We study a new pipeline that aims to generate static as well as animated 3D scenes from different types of free-form textual scene description.
In the first stage, we encode the free-form text using an encoder-decoder neural architecture.
In the second stage, we generate a 3D scene based on the generated encoding.
arXiv Detail & Related papers (2020-10-04T11:31:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.