From Image Generation to Infrastructure Design: a Multi-agent Pipeline for Street Design Generation
- URL: http://arxiv.org/abs/2509.05469v1
- Date: Fri, 05 Sep 2025 19:49:36 GMT
- Title: From Image Generation to Infrastructure Design: a Multi-agent Pipeline for Street Design Generation
- Authors: Chenguang Wang, Xiang Yan, Yilong Dai, Ziyi Wang, Susu Xu,
- Abstract summary: We introduce a multi-agent system that edits and redesigns bicycle facilities directly on real-world street-view imagery.<n>The framework integrates lane localization, prompt optimization, design generation, and automated evaluation to synthesize realistic, contextually appropriate designs.
- Score: 9.255248190497515
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Realistic visual renderings of street-design scenarios are essential for public engagement in active transportation planning. Traditional approaches are labor-intensive, hindering collective deliberation and collaborative decision-making. While AI-assisted generative design shows transformative potential by enabling rapid creation of design scenarios, existing generative approaches typically require large amounts of domain-specific training data and struggle to enable precise spatial variations of design/configuration in complex street-view scenes. We introduce a multi-agent system that edits and redesigns bicycle facilities directly on real-world street-view imagery. The framework integrates lane localization, prompt optimization, design generation, and automated evaluation to synthesize realistic, contextually appropriate designs. Experiments across diverse urban scenarios demonstrate that the system can adapt to varying road geometries and environmental conditions, consistently yielding visually coherent and instruction-compliant results. This work establishes a foundation for applying multi-agent pipelines to transportation infrastructure planning and facility design.
Related papers
- A Unified Experimental Architecture for Informative Path Planning: from Simulation to Deployment with GuadalPlanner [69.43049144653882]
This paper introduces a unified architecture that decouples high-level decision-making from vehicle-specific control.<n>The proposed architecture is realized through GuadalPlanner, which defines standardized interfaces between planning, sensing, and vehicle execution.
arXiv Detail & Related papers (2026-02-11T10:02:31Z) - Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models [16.15278208238539]
This study proposes a stepwise generative urban design framework that integrates multimodal diffusion models with human expertise.<n>Rather than generating outcomes in a single end-to-end process, the framework divides the process into three key stages aligned with established urban design.<n>At each stage, rendering diffusion models generate preliminary designs based on textual prompts and image-based constraints.
arXiv Detail & Related papers (2025-05-30T06:33:48Z) - CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design [69.83433430133302]
CreatiDesign is a systematic solution for automated graphic design covering both model architecture and dataset construction.<n>First, we design a unified multi-condition driven architecture that enables flexible and precise integration of heterogeneous design elements.<n> Furthermore, to ensure that each condition precisely controls its designated image region, we propose a multimodal attention mask mechanism.
arXiv Detail & Related papers (2025-05-25T12:14:23Z) - Generative AI for Urban Planning: Synthesizing Satellite Imagery via Diffusion Models [9.385767746826286]
We adapt a state-of-the-art Stable Diffusion model, extended with ControlNet, to generate high-fidelity satellite imagery conditioned on land use descriptions, infrastructure, and natural environments.<n>Using data from three major U.S. cities, we demonstrate that the proposed diffusion model generates realistic and diverse urban landscapes by varying land-use configurations, road networks, and water bodies.<n>Our model achieves high FID and KID scores and demonstrates robustness across diverse urban contexts.
arXiv Detail & Related papers (2025-05-13T04:55:38Z) - StyledStreets: Multi-style Street Simulator with Spatial and Temporal Consistency [7.860619819904401]
textbfStyledStreets is a multi-style street simulator that achieves instruction-driven scene editing.<n>Hybrid embedding scheme disentangles persistent scene geometry from transient style attributes.<n> unified parametric model prevents geometric drift through regularized updates.
arXiv Detail & Related papers (2025-03-27T02:52:29Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.<n>Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.<n>We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Generative methods for Urban design and rapid solution space exploration [13.222198221605701]
This research introduces an implementation of a tensor-field-based generative urban modeling toolkit.
Our method encodes contextual constraints such as waterfront edges, terrain, view-axis, existing streets, landmarks, and non-geometric design inputs.
This allows users to generate many, diverse urban fabric configurations that resemble real-world cities with very few model inputs.
arXiv Detail & Related papers (2022-12-13T17:58:02Z) - TransVG: End-to-End Visual Grounding with Transformers [102.11922622103613]
We present a transformer-based framework for visual grounding, namely TransVG, to address the task of grounding a language query to an image.
We show that the complex fusion modules can be replaced by a simple stack of transformer encoder layers with higher performance.
arXiv Detail & Related papers (2021-04-17T13:35:24Z) - Simultaneous Navigation and Construction Benchmarking Environments [73.0706832393065]
We need intelligent robots for mobile construction, the process of navigating in an environment and modifying its structure according to a geometric design.
In this task, a major robot vision and learning challenge is how to exactly achieve the design without GPS.
We benchmark the performance of a handcrafted policy with basic localization and planning, and state-of-the-art deep reinforcement learning methods.
arXiv Detail & Related papers (2021-03-31T00:05:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.