HouseLLM: LLM-Assisted Two-Phase Text-to-Floorplan Generation
- URL: http://arxiv.org/abs/2411.12279v2
- Date: Wed, 20 Nov 2024 05:05:48 GMT
- Title: HouseLLM: LLM-Assisted Two-Phase Text-to-Floorplan Generation
- Authors: Ziyang Zong, Zhaohuan Zhan, Guang Tan,
- Abstract summary: This paper proposes a two-phase text-to-floorplan generation method, which guides a Large Language Model (LLM) to generate an initial layout.
We incorporate a Chain-of-Thought approach to prompt the LLM based on user text specifications, enabling a more user-friendly and intuitive house layout design.
Experimental results demonstrate that our approach achieves state-of-the-art performance across all metrics, validating its effectiveness in practical home design applications.
- Score: 4.242755827806053
- License:
- Abstract: This paper proposes a two-phase text-to-floorplan generation method, which guides a Large Language Model (LLM) to generate an initial layout (Layout-LLM) and refines them into the final floorplans through conditional diffusion model. We incorporate a Chain-of-Thought approach to prompt the LLM based on user text specifications, enabling a more user-friendly and intuitive house layout design. This method allows users to describe their needs in natural language, enhancing accessibility and providing clearer geometric constraints. The final floorplans generated by Layout-LLM through conditional diffusion refinement are more accurate and better meet user requirements. Experimental results demonstrate that our approach achieves state-of-the-art performance across all metrics, validating its effectiveness in practical home design applications. We plan to release our code for public use.
Related papers
- GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts.
We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously.
To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora.
To make LLMs more usable, aligning them with human preferences is essential.
We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z) - ChatHouseDiffusion: Prompt-Guided Generation and Editing of Floor Plans [10.82348603357201]
This paper introduces ChatHouseDiffusion, which leverages large language models (LLMs) to interpret natural language input.
It also employs graphormer to encode topological relationships, and uses diffusion models to flexibly generate and edit floor plans.
Compared to existing models, ChatHouseDiffusion achieves higher Intersection over Union (IoU) scores, permitting precise, localized adjustments without the need for completes.
arXiv Detail & Related papers (2024-10-15T02:41:46Z) - ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning [72.90823351726374]
We introduce the Unified framework for Large Language Model Embedding (ULLME), a flexible, plug-and-play implementation that enables bidirectional attention across various LLMs.
We also propose Generation-augmented Representation Learning (GRL), a novel fine-tuning method to boost LLMs for text embedding tasks.
To showcase our framework's flexibility and effectiveness, we release three pre-trained models from ULLME with different backbone architectures.
arXiv Detail & Related papers (2024-08-06T18:53:54Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - Embedding-Aligned Language Models [23.28201682160259]
We propose a novel approach for training large language models (LLMs) to adhere to objectives defined within a latent embedding space.
Our embedding-aligned guided language (EAGLE) agent is trained to iteratively steer the LLM's generation towards optimal regions of the latent embedding space.
arXiv Detail & Related papers (2024-05-24T06:11:17Z) - LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding [21.916774808384893]
The proposed layout instruction tuning strategy consists of two components: layout-aware Pre-training and layout-aware Supervised Finetuning.
Experiments on standard benchmarks show that the proposed LayoutLLM significantly outperforms existing methods that adopt open-source 7B LLMs/MLLMs for document understanding.
arXiv Detail & Related papers (2024-04-08T06:40:28Z) - GenSERP: Large Language Models for Whole Page Presentation [22.354349023665538]
GenSERP is a framework that leverages large language models with vision in a few-shot setting to dynamically organize intermediate search results.
Our approach has three main stages: information gathering, answer generation, and scoring phase.
arXiv Detail & Related papers (2024-02-22T05:41:24Z) - A Picture is Worth a Thousand Words: Language Models Plan from Pixels [53.85753597586226]
Planning is an important capability of artificial agents that perform long-horizon tasks in real-world environments.
In this work, we explore the use of pre-trained language models (PLMs) to reason about plan sequences from text instructions in embodied visual environments.
arXiv Detail & Related papers (2023-03-16T02:02:18Z) - Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs.
Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.