Related papers: Programmable-Room: Interactive Textured 3D Room Meshes Generation Empowered by Large Language Models

Programmable-Room: Interactive Textured 3D Room Meshes Generation Empowered by Large Language Models

URL: http://arxiv.org/abs/2506.17707v1
Date: Sat, 21 Jun 2025 13:00:06 GMT
Title: Programmable-Room: Interactive Textured 3D Room Meshes Generation Empowered by Large Language Models
Authors: Jihyun Kim, Junho Park, Kyeongbo Kong, Suk-Ju Kang,
Abstract summary: Programmable-Room is a framework which interactively generates and edits a 3D room mesh, given natural language instructions.<n>For precise control of a room's each attribute, we decompose the challenging task into simpler steps such as creating plausible 3D coordinates for room meshes.<n>To support the various decomposed tasks with a unified framework, we incorporate visual programming (VP)
Score: 16.828694984680553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present Programmable-Room, a framework which interactively generates and edits a 3D room mesh, given natural language instructions. For precise control of a room's each attribute, we decompose the challenging task into simpler steps such as creating plausible 3D coordinates for room meshes, generating panorama images for the texture, constructing 3D meshes by integrating the coordinates and panorama texture images, and arranging furniture. To support the various decomposed tasks with a unified framework, we incorporate visual programming (VP). VP is a method that utilizes a large language model (LLM) to write a Python-like program which is an ordered list of necessary modules for the various tasks given in natural language. We develop most of the modules. Especially, for the texture generating module, we utilize a pretrained large-scale diffusion model to generate panorama images conditioned on text and visual prompts (i.e., layout, depth, and semantic map) simultaneously. Specifically, we enhance the panorama image generation quality by optimizing the training objective with a 1D representation of a panorama scene obtained from bidirectional LSTM. We demonstrate Programmable-Room's flexibility in generating and editing 3D room meshes, and prove our framework's superiority to an existing model quantitatively and qualitatively. Project page is available in https://jihyun0510.github.io/Programmable_Room_Page/.

Related papers

Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation [56.862552362223425]
This report presents a comprehensive framework for generating high-quality 3D shapes and textures from diverse input prompts.<n>The framework consists of 3D shape generation and texture generation.<n>This report details the system architecture, experimental results, and potential future directions to improve and expand the framework.
arXiv Detail & Related papers (2025-02-20T04:22:30Z)
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting [47.014044892025346]
Architect is a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting. Our pipeline is further extended to a hierarchical and iterative inpainting process to continuously generate placement of large furniture and small objects to enrich the scene.
arXiv Detail & Related papers (2024-11-14T22:15:48Z)
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation. We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z)
Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models [12.965144877139393]
We introduce Dream2Real, a robotics framework which integrates vision-language models (VLMs) trained on 2D data into a 3D object rearrangement pipeline. This is achieved by the robot autonomously constructing a 3D representation of the scene, where objects can be rearranged virtually and an image of the resulting arrangement rendered. These renders are evaluated by a VLM, so that the arrangement which best satisfies the user instruction is selected and recreated in the real world with pick-and-place.
arXiv Detail & Related papers (2023-12-07T18:51:19Z)
3D-GPT: Procedural 3D Modeling with Large Language Models [47.72968643115063]
We introduce 3D-GPT, a framework utilizing large language models(LLMs) for instruction-driven 3D modeling. 3D-GPT positions LLMs as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task. Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers.
arXiv Detail & Related papers (2023-10-19T17:41:48Z)
Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints [35.073500525250346]
We present Ctrl-Room, which can generate convincing 3D rooms with designer-style layouts and high-fidelity textures from just a text prompt.<n> Ctrl-Room enables versatile interactive editing operations such as resizing or moving individual furniture items.
arXiv Detail & Related papers (2023-10-05T15:29:52Z)
Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models. Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z)
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models [98.81962282674151]
Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions. We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language.
arXiv Detail & Related papers (2023-05-24T17:56:16Z)
DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone. It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data. Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z)
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models [21.622420436349245]
We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input. We leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses. In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model.
arXiv Detail & Related papers (2023-03-21T16:21:02Z)
SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation [89.47132156950194]
We present a novel framework built to simplify 3D asset generation for amateur users. Our method supports a variety of input modalities that can be easily provided by a human. Our model can combine all these tasks into one swiss-army-knife tool.
arXiv Detail & Related papers (2022-12-08T18:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.