Related papers: Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases

Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases

URL: http://arxiv.org/abs/2403.09675v1
Date: Mon, 5 Feb 2024 01:59:31 GMT
Title: Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases
Authors: Rio Aguina-Kang, Maxim Gumin, Do Heon Han, Stewart Morris, Seung Jean Yoo, Aditya Ganeshan, R. Kenny Jones, Qiuhong Anna Wei, Kailiang Fu, Daniel Ritchie,
Abstract summary: We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions. The objects in generated scenes are not restricted to a fixed set of object categories.
Score: 13.126239167800652
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions, and the objects in generated scenes are not restricted to a fixed set of object categories -- we call this setting indoor scene generation. Unlike most prior work on indoor scene generation, our system does not require a large training dataset of existing 3D scenes. Instead, it leverages the world knowledge encoded in pre-trained large language models (LLMs) to synthesize programs in a domain-specific layout language that describe objects and spatial relations between them. Executing such a program produces a specification of a constraint satisfaction problem, which the system solves using a gradient-based optimization scheme to produce object positions and orientations. To produce object geometry, the system retrieves 3D meshes from a database. Unlike prior work which uses databases of category-annotated, mutually-aligned meshes, we develop a pipeline using vision-language models (VLMs) to retrieve meshes from massive databases of un-annotated, inconsistently-aligned meshes. Experimental evaluations show that our system outperforms generative models trained on 3D data for traditional, closed-universe scene generation tasks; it also outperforms a recent LLM-based layout generation method on open-universe scene generation.

Related papers

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting [54.92763171355442]
ObjectGS is an object-aware framework that unifies 3D scene reconstruction with semantic understanding.<n>We show through experiments that ObjectGS outperforms state-of-the-art methods on open-vocabulary and panoptic segmentation tasks.
arXiv Detail & Related papers (2025-07-21T10:06:23Z)
Agentic 3D Scene Generation with Spatially Contextualized VLMs [67.31920821192323]
We introduce a new paradigm that enables vision-language models to generate, understand, and edit complex 3D environments.<n>We develop an agentic 3D scene generation pipeline in which the VLM iteratively reads from and updates the spatial context.<n>Results show that our framework can handle diverse and challenging inputs, achieving a level of generalization not observed in prior work.
arXiv Detail & Related papers (2025-05-26T15:28:17Z)
Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes [18.232310061758298]
3D indoor scene generation is an important problem for the design of digital and real-world environments. Existing methods for this task exhibit very limited control over these attributes. Our proposed method Decorum enables users to control the scene generation process with natural language.
arXiv Detail & Related papers (2025-03-23T17:48:44Z)
Learning Object Placement Programs for Indoor Scene Synthesis with Iterative Self Training [27.788560122097792]
Data driven and autoregressive indoor scene systems generate scenes automatically by suggesting and then placing objects one at a time. We design a Domain Specific Language that specifies functional constraints. We build upon previous work in unsupervised program induction to introduce a new program bootstrapping algorithm. Our system also generates indoor scenes of comparable quality to previous systems and while previous systems degrade in performance when training data is sparse, our system does not degrade to the same degree.
arXiv Detail & Related papers (2025-03-06T14:44:25Z)
ROOT: VLM based System for Indoor Scene Understanding and Beyond [83.71252153660078]
ROOT is a VLM-based system designed to enhance the analysis of indoor scenes. rootname facilitates indoor scene understanding and proves effective in diverse downstream applications, such as 3D scene generation and embodied AI.
arXiv Detail & Related papers (2024-11-24T04:51:24Z)
Teaching VLMs to Localize Specific Objects from In-context Examples [56.797110842152]
Vision-Language Models (VLMs) have shown remarkable capabilities across diverse visual tasks. Current VLMs lack a fundamental cognitive ability: learning to localize objects in a scene by taking into account the context. This work is the first to explore and benchmark personalized few-shot localization for VLMs.
arXiv Detail & Related papers (2024-11-20T13:34:22Z)
LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model [58.24851949945434]
LLplace is a novel 3D indoor scene layout designer based on lightweight fine-tuned open-source LLM Llama3. LLplace circumvents the need for spatial relationship priors and in-context exemplars, enabling efficient and credible room layout generation. Our approach demonstrates that LLplace can effectively generate and edit 3D indoor layouts interactively and outperform existing methods in delivering high-quality 3D design solutions.
arXiv Detail & Related papers (2024-06-06T08:53:01Z)
Grounded 3D-LLM with Referent Tokens [58.890058568493096]
We propose Grounded 3D-LLM to consolidate various 3D vision tasks within a unified generative framework. The model uses scene referent tokens as special noun phrases to reference 3D scenes. Per-task instruction-following templates are employed to ensure natural and diversity in translating 3D vision tasks into language formats.
arXiv Detail & Related papers (2024-05-16T18:03:41Z)
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model [7.707324214953882]
We introduce SceneScript, a method that produces full scene models as a sequence of structured language commands. Our method infers the set of structured language commands directly from encoded visual data. Our method gives state-of-the art results in architectural layout estimation, and competitive results in 3D object detection.
arXiv Detail & Related papers (2024-03-19T18:01:29Z)
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers [65.51132104404051]
We introduce the use of object identifiers and object-centric representations to interact with scenes at the object level. Our model significantly outperforms existing methods on benchmarks including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D.
arXiv Detail & Related papers (2023-12-13T14:27:45Z)
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent [23.134180979449823]
3D visual grounding is a critical skill for household robots, enabling them to navigate, manipulate objects, and answer questions based on their environment. We propose LLM-Grounder, a novel zero-shot, open-vocabulary, Large Language Model (LLM)-based 3D visual grounding pipeline. Our findings indicate that LLMs significantly improve the grounding capability, especially for complex language queries.
arXiv Detail & Related papers (2023-09-21T17:59:45Z)
CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.