AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes
- URL: http://arxiv.org/abs/2312.06644v3
- Date: Mon, 29 Jul 2024 00:09:46 GMT
- Title: AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes
- Authors: Rao Fu, Zehao Wen, Zichen Liu, Srinath Sridhar,
- Abstract summary: We introduce AnyHome, a framework that translates any text into well-structured and textured indoor scenes at a house-scale.
By prompting Large Language Models (LLMs) with designed templates, our approach converts provided textual narratives into amodal structured representations.
A Score Distillation Sampling process is then employed to refine the geometry, followed by an egocentric inpainting process that adds lifelike textures to it.
- Score: 10.482201110770584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by cognitive theories, we introduce AnyHome, a framework that translates any text into well-structured and textured indoor scenes at a house-scale. By prompting Large Language Models (LLMs) with designed templates, our approach converts provided textual narratives into amodal structured representations. These representations guarantee consistent and realistic spatial layouts by directing the synthesis of a geometry mesh within defined constraints. A Score Distillation Sampling process is then employed to refine the geometry, followed by an egocentric inpainting process that adds lifelike textures to it. AnyHome stands out with its editability, customizability, diversity, and realism. The structured representations for scenes allow for extensive editing at varying levels of granularity. Capable of interpreting texts ranging from simple labels to detailed narratives, AnyHome generates detailed geometries and textures that outperform existing methods in both quantitative and qualitative measures.
Related papers
- The Scene Language: Representing Scenes with Programs, Words, and Embeddings [23.707974056165042]
We introduce the Scene Language, a visual scene representation that concisely and precisely describes the structure, semantics, and identity of visual scenes.
It represents a scene with three key components: a program that specifies the hierarchical and relational structure of entities in the scene, words in natural language that summarize the semantic class of each entity, and embeddings that capture the visual identity of each entity.
arXiv Detail & Related papers (2024-10-22T07:40:20Z) - Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior [43.14168074750301]
We introduce a compositional 3D layout representation into text-to-3D paradigm, serving as an additional prior.
It comprises a set of semantic primitives with simple geometric structures and explicit arrangement relationships.
We also present various scene editing demonstrations, showing the powers of steerable urban scene generation.
arXiv Detail & Related papers (2024-04-10T06:41:30Z) - Style-Consistent 3D Indoor Scene Synthesis with Decoupled Objects [84.45345829270626]
Controllable 3D indoor scene synthesis stands at the forefront of technological progress.
Current methods for scene stylization are limited to applying styles to the entire scene.
We introduce a unique pipeline designed for synthesis 3D indoor scenes.
arXiv Detail & Related papers (2024-01-24T03:10:36Z) - TextureDreamer: Image-guided Texture Synthesis through Geometry-aware
Diffusion [64.49276500129092]
TextureDreamer is an image-guided texture synthesis method.
It can transfer relightable textures from a small number of input images to target 3D shapes across arbitrary categories.
arXiv Detail & Related papers (2024-01-17T18:55:49Z) - DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture
Propagation [31.353409149640605]
In this paper, we propose a novel framework to generate 3D textures for immersive VR experiences.
To survive, we separate texture cues in confidential regions and learn to network textures in real-world environments.
arXiv Detail & Related papers (2023-10-19T19:29:23Z) - Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints [35.073500525250346]
We present Ctrl-Room, which can generate convincing 3D rooms with designer-style layouts and high-fidelity textures from just a text prompt.
Ctrl-Room enables versatile interactive editing operations such as resizing or moving individual furniture items.
arXiv Detail & Related papers (2023-10-05T15:29:52Z) - Text2Scene: Text-driven Indoor Scene Stylization with Part-aware Details [12.660352353074012]
We propose Text2Scene, a method to automatically create realistic textures for virtual scenes composed of multiple objects.
Our pipeline adds detailed texture on labeled 3D geometries in the room such that the generated colors respect the hierarchical structure or semantic parts that are often composed of similar materials.
arXiv Detail & Related papers (2023-08-31T17:37:23Z) - TADA! Text to Animatable Digital Avatars [57.52707683788961]
TADA takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures.
We derive an optimizable high-resolution body model from SMPL-X with 3D displacements and a texture map.
We render normals and RGB images of the generated character and exploit their latent embeddings in the SDS training process.
arXiv Detail & Related papers (2023-08-21T17:59:10Z) - RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent
Geometry and Texture [80.0643976406225]
We propose "RoomDreamer", which leverages powerful natural language to synthesize a new room with a different style.
Our work addresses the challenge of synthesizing both geometry and texture aligned to the input scene structure and prompt simultaneously.
To validate the proposed method, real indoor scenes scanned with smartphones are used for extensive experiments.
arXiv Detail & Related papers (2023-05-18T22:57:57Z) - TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision [114.56048848216254]
We present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions.
Based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates.
Our constructed captions provide high-level semantic supervision for generated 3D shapes.
arXiv Detail & Related papers (2023-03-23T13:53:16Z) - Intelligent Home 3D: Automatic 3D-House Design from Linguistic
Descriptions Only [55.3363844662966]
We formulate it as a language conditioned visual content generation problem that is divided into a floor plan generation and an interior texture synthesis task.
To train and evaluate our model, we build the first Text-to-3D House Model dataset.
arXiv Detail & Related papers (2020-03-01T04:28:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.