Text Semantics to Image Generation: A method of building facades design
base on Stable Diffusion model
- URL: http://arxiv.org/abs/2303.12755v3
- Date: Fri, 7 Apr 2023 10:22:34 GMT
- Title: Text Semantics to Image Generation: A method of building facades design
base on Stable Diffusion model
- Authors: Haoran Ma
- Abstract summary: A multi-network combined text-to-building facade image generating method is proposed in this work.
We first fine-tuned the Stable Diffusion model on the CMP Fa-cades dataset using the LoRA approach.
The addition of the ControlNet model increases the controllability of the creation of text to building facade images.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stable Diffusion model has been extensively employed in the study of
archi-tectural image generation, but there is still an opportunity to enhance
in terms of the controllability of the generated image content. A multi-network
combined text-to-building facade image generating method is proposed in this
work. We first fine-tuned the Stable Diffusion model on the CMP Fa-cades
dataset using the LoRA (Low-Rank Adaptation) approach, then we ap-ply the
ControlNet model to further control the output. Finally, we contrast-ed the
facade generating outcomes under various architectural style text con-tents and
control strategies. The results demonstrate that the LoRA training approach
significantly decreases the possibility of fine-tuning the Stable Dif-fusion
large model, and the addition of the ControlNet model increases the
controllability of the creation of text to building facade images. This
pro-vides a foundation for subsequent studies on the generation of
architectural images.
Related papers
- Pro-DG: Procedural Diffusion Guidance for Architectural Facade Generation [46.76076836382595]
Pro-DG is a framework for procedurally controllable photo-realistic facade generation.
We reconstruct its facade layout using grammar rules, then edit that structure through user-defined transformations.
arXiv Detail & Related papers (2025-04-02T10:16:19Z) - HRR: Hierarchical Retrospection Refinement for Generated Image Detection [16.958383381415445]
We propose a diffusion model-based generative image detection framework termed Hierarchical Retrospection Refinement(HRR)
The HRR framework consistently delivers significant performance improvements, outperforming state-of-the-art methods in generated image detection task.
arXiv Detail & Related papers (2025-02-25T05:13:44Z) - Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [77.86514804787622]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.
We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.
We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - Structured Pattern Expansion with Diffusion Models [6.726377308248659]
Recent advances in diffusion models have significantly improved the synthesis of materials, textures, and 3D shapes.
In this paper, we address the synthesis of structured, stationary patterns, where diffusion models are generally less reliable and, more importantly, less controllable.
It enables users to exercise direct control over the synthesis by expanding a partially hand-drawn pattern into a larger design while preserving the structure and details of the input.
arXiv Detail & Related papers (2024-11-12T18:39:23Z) - Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior.
Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs)
We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - ControlCom: Controllable Image Composition using Diffusion Model [45.48263800282992]
We propose a controllable image composition method that unifies four tasks in one diffusion model.
We also propose a local enhancement module to enhance the foreground details in the diffusion model.
The proposed method is evaluated on both public benchmark and real-world data.
arXiv Detail & Related papers (2023-08-19T14:56:44Z) - PRedItOR: Text Guided Image Editing with Diffusion Prior [2.3022070933226217]
Text guided image editing requires compute intensive optimization of text embeddings or fine-tuning the model weights for text guided image editing.
Our architecture consists of a diffusion prior model that generates CLIP image embedding conditioned on a text prompt and a custom Latent Diffusion Model trained to generate images conditioned on CLIP image embedding.
We combine this with structure preserving edits on the image decoder using existing approaches such as reverse DDIM to perform text guided image editing.
arXiv Detail & Related papers (2023-02-15T22:58:11Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - Normalizing Flows with Multi-Scale Autoregressive Priors [131.895570212956]
We introduce channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR)
Our mAR prior for models with split coupling flow layers (mAR-SCF) can better capture dependencies in complex multimodal data.
We show that mAR-SCF allows for improved image generation quality, with gains in FID and Inception scores compared to state-of-the-art flow-based models.
arXiv Detail & Related papers (2020-04-08T09:07:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.