UrbanGenAI: Reconstructing Urban Landscapes using Panoptic Segmentation
and Diffusion Models
- URL: http://arxiv.org/abs/2401.14379v1
- Date: Thu, 25 Jan 2024 18:30:46 GMT
- Title: UrbanGenAI: Reconstructing Urban Landscapes using Panoptic Segmentation
and Diffusion Models
- Authors: Timo Kapsalis
- Abstract summary: This paper presents a novel workflow encapsulated within a prototype application, designed to leverage the synergies between advanced image segmentation and diffusion models for a comprehensive approach to urban design.
validation results indicated a high degree of performance by the prototype application, showcasing significant accuracy in both object detection and text-to-image generation.
Preliminary testing included utilising UrbanGenAI as an educational tool enhancing the learning experience in design pedagogy, and as a participatory instrument facilitating community-driven urban planning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In contemporary design practices, the integration of computer vision and
generative artificial intelligence (genAI) represents a transformative shift
towards more interactive and inclusive processes. These technologies offer new
dimensions of image analysis and generation, which are particularly relevant in
the context of urban landscape reconstruction. This paper presents a novel
workflow encapsulated within a prototype application, designed to leverage the
synergies between advanced image segmentation and diffusion models for a
comprehensive approach to urban design. Our methodology encompasses the
OneFormer model for detailed image segmentation and the Stable Diffusion XL
(SDXL) diffusion model, implemented through ControlNet, for generating images
from textual descriptions. Validation results indicated a high degree of
performance by the prototype application, showcasing significant accuracy in
both object detection and text-to-image generation. This was evidenced by
superior Intersection over Union (IoU) and CLIP scores across iterative
evaluations for various categories of urban landscape features. Preliminary
testing included utilising UrbanGenAI as an educational tool enhancing the
learning experience in design pedagogy, and as a participatory instrument
facilitating community-driven urban planning. Early results suggested that
UrbanGenAI not only advances the technical frontiers of urban landscape
reconstruction but also provides significant pedagogical and participatory
planning benefits. The ongoing development of UrbanGenAI aims to further
validate its effectiveness across broader contexts and integrate additional
features such as real-time feedback mechanisms and 3D modelling capabilities.
Keywords: generative AI; panoptic image segmentation; diffusion models; urban
landscape design; design pedagogy; co-design
Related papers
- Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation [87.50120181861362]
VisionPrefer is a high-quality and fine-grained preference dataset that captures multiple preference aspects.
We train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators.
arXiv Detail & Related papers (2024-04-23T14:53:15Z) - YaART: Yet Another ART Rendering Technology [119.09155882164573]
This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences.
We analyze how these choices affect both the efficiency of the training process and the quality of the generated images.
We demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets.
arXiv Detail & Related papers (2024-04-08T16:51:19Z) - State of the Art on Diffusion Models for Visual Computing [191.6168813012954]
This report introduces the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model.
We also give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing.
We discuss available datasets, metrics, open challenges, and social implications.
arXiv Detail & Related papers (2023-10-11T05:32:29Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - Generative methods for Urban design and rapid solution space exploration [13.222198221605701]
This research introduces an implementation of a tensor-field-based generative urban modeling toolkit.
Our method encodes contextual constraints such as waterfront edges, terrain, view-axis, existing streets, landmarks, and non-geometric design inputs.
This allows users to generate many, diverse urban fabric configurations that resemble real-world cities with very few model inputs.
arXiv Detail & Related papers (2022-12-13T17:58:02Z) - Explainability of Deep Learning models for Urban Space perception [9.422663267011913]
This study investigates how computer vision models can be used to extract relevant policy information about peoples' perception of the urban space.
We train two widely used computer vision architectures; a Convolutional Neural Network and a transformer, and apply GradCAM -- a well-known ex-post explainable AI technique -- to highlight the image regions important for the model's prediction.
arXiv Detail & Related papers (2022-08-29T12:44:48Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - Dynamically Grown Generative Adversarial Networks [111.43128389995341]
We propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation.
The method embeds architecture search techniques as an interleaving step with gradient-based training to periodically seek the optimal architecture-growing strategy for the generator and discriminator.
arXiv Detail & Related papers (2021-06-16T01:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.