Text Semantics to Flexible Design: A Residential Layout Generation Method Based on Stable Diffusion Model
- URL: http://arxiv.org/abs/2501.09279v1
- Date: Thu, 16 Jan 2025 03:57:38 GMT
- Title: Text Semantics to Flexible Design: A Residential Layout Generation Method Based on Stable Diffusion Model
- Authors: Zijin Qiu, Jiepeng Liu, Yi Xia, Hongtuo Qi, Pengkun Liu,
- Abstract summary: We propose a cross-modal design approach based on the Stable Diffusion model for generating flexible residential layouts.
The method offers multiple input types for learning objectives, allowing users to specify both boundaries and layouts.
We also present a scheme that encapsulates design expertise within a knowledge graph and translates it into natural language.
- Score: 0.6990493129893112
- License:
- Abstract: Flexibility in the AI-based residential layout design remains a significant challenge, as traditional methods like rule-based heuristics and graph-based generation often lack flexibility and require substantial design knowledge from users. To address these limitations, we propose a cross-modal design approach based on the Stable Diffusion model for generating flexible residential layouts. The method offers multiple input types for learning objectives, allowing users to specify both boundaries and layouts. It incorporates natural language as design constraints and introduces ControlNet to enable stable layout generation through two distinct pathways. We also present a scheme that encapsulates design expertise within a knowledge graph and translates it into natural language, providing an interpretable representation of design knowledge. This comprehensibility and diversity of input options enable professionals and non-professionals to directly express design requirements, enhancing flexibility and controllability. Finally, experiments verify the flexibility of the proposed methods under multimodal constraints better than state-of-the-art models, even when specific semantic information about room areas or connections is incomplete.
Related papers
- GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts.
We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously.
To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts [68.48103545146127]
This paper proposes a novel framework for unsupervised exploration of diffusion latent spaces.
We directly leverage natural language prompts and image captions to map latent directions.
Our method provides a more scalable and interpretable understanding of the semantic knowledge encoded within diffusion models.
arXiv Detail & Related papers (2024-10-25T21:44:51Z) - ChatHouseDiffusion: Prompt-Guided Generation and Editing of Floor Plans [10.82348603357201]
This paper introduces ChatHouseDiffusion, which leverages large language models (LLMs) to interpret natural language input.
It also employs graphormer to encode topological relationships, and uses diffusion models to flexibly generate and edit floor plans.
Compared to existing models, ChatHouseDiffusion achieves higher Intersection over Union (IoU) scores, permitting precise, localized adjustments without the need for completes.
arXiv Detail & Related papers (2024-10-15T02:41:46Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - CoLay: Controllable Layout Generation through Multi-conditional Latent Diffusion [21.958752304572553]
Existing models face two main challenges that limits their adoption in practice.
Most existing models focus on generating labels and coordinates, while real layouts contain a range of style properties.
We propose a novel framework, CoLay, that integrates multiple condition types and generates complex layouts with diverse style properties.
arXiv Detail & Related papers (2024-05-18T17:30:48Z) - Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints [53.66698106829144]
We propose a unified model to handle a broad range of layout generation tasks.
The model is based on continuous diffusion models.
Experiment results show that LACE produces high-quality layouts.
arXiv Detail & Related papers (2024-02-07T11:12:41Z) - Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image
Personalization [56.12990759116612]
Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods.
The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
arXiv Detail & Related papers (2024-01-30T05:56:12Z) - Controlled Text Generation with Natural Language Instructions [74.88938055638636]
InstructCTG is a controlled text generation framework that incorporates different constraints.
We first extract the underlying constraints of natural texts through a combination of off-the-shelf NLP tools and simple verbalizes.
By prepending natural language descriptions of the constraints and a few demonstrations, we fine-tune a pre-trained language model to incorporate various types of constraints.
arXiv Detail & Related papers (2023-04-27T15:56:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.