Fugu-MT 論文翻訳(概要): Designing streetscapes from street-view imagery using diffusion models

論文の概要: Designing streetscapes from street-view imagery using diffusion models

arxiv url: http://arxiv.org/abs/2605.17527v1
Date: Sun, 17 May 2026 16:20:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 23:51:08.375992
Title: Designing streetscapes from street-view imagery using diffusion models
Title（参考訳）: 拡散モデルを用いたストリートビュー画像からの街路景観の設計
Authors: Yuzhou Chen, Yuebing Liang, Lingqian Hu, Kailai Sun, Qingqi Song, Chang Zhao, Shenhao Wang,
Abstract要約: ストリートビュー画像(SVI)は、グリーン・エレー、スカイ、ロードビューの指標などの都市環境の重要な指標を定量化するために広く用いられている。既存の研究は、現在の街路景観を測ることに重点を置いており、代替的および既存の都市シナリオの生成を支援することはめったにない。本稿では、ターゲットの視覚的指標に基づいて、他の街路景観を合成する、ジェネラ指向のマルチモーダルAIフレームワークを提案する。
参考スコア（独自算出の注目度）: 18.860196708556995
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Street-view imagery (SVI) is widely used to quantify key indicators of urban environment, such as green- ery, sky, or road view indices. However, existing studies largely focus on measuring current streetscapes and rarely support the generation of alternative and non-existing urban scenarios, which is a core task in geospatial disciplines such as urban planning and design. To address this gap, we propose a gener- ative multimodal AI framework that synthesizes alternative streetscapes conditioned on targeted visual metrics, enabling direct visual exploration of urban scenarios. We first construct a multimodal dataset that aligns SVIs with textual descriptions, segmentation maps, road masks, and quantitative metrics of visual elements in Chicago and Orlando. Using this dataset, we demonstrate that diffusion models can produce realistic and semantically consistent streetscape imagery while responding to both textual and imagery controls. Our quantitative evaluations show that incorporating visual controls can improve semantic consistency, reducing the LPIPS index by approximately 6% while maintaining global visual realism. In addition, overall semantic consistency increases by 23.7% in Orlando and 46.4% in Chicago, as measured by the mIoU index, with class-wise gains exceeding even 100% improvement for building view indices. Streetscape generation can be controlled in a fine-grained manner by both visual and textual prompts, and when textual and visual controls conflict, imagery controls consistently dominate, indicating a clear control hierarchy and the importance of further developing visual controls for urban scene generation. Overall, this work establishes an important benchmark for streetscape generation us- ing SVIs and diffusion models, and illustrates how generative AI can serve as a practical, scalable, and controllable approach for urban scenario exploration.
Abstract（参考訳）: ストリートビュー画像(SVI)は、グリーン・エレー、スカイ、ロードビューの指標などの都市環境の重要な指標を定量化するために広く用いられている。しかし、既存の研究は、現在の街路景観を測ることに重点を置いており、都市計画や設計などの地理分野における中核的な課題である、代替的・非既存の都市シナリオの生成を支援することはめったにない。このギャップに対処するために、ターゲットの視覚的指標に基づく代替の街路景観を合成し、都市シナリオを直接視覚的に探索するジェネレータ対マルチモーダルAIフレームワークを提案する。まず、SVIとテキスト記述、セグメンテーションマップ、ロードマスク、およびシカゴとオーランドの視覚要素の定量的測定値とを一致させるマルチモーダルデータセットを構築した。このデータセットを用いて,テキストと画像の両方に反応しながら,拡散モデルが現実的かつ意味論的に一貫した街並み画像を生成することを実証した。視覚的制御を取り入れることで意味的一貫性が向上し,世界的視覚リアリズムを維持しつつLPIPSインデックスを約6%削減できることを示す。さらに、総合的なセマンティック一貫性はオーランドでは23.7%、シカゴでは46.4%増加し、mIoU指数で測定された。街並み生成は、視覚的プロンプトとテキスト的プロンプトの両方によってきめ細かな制御が可能であり、テキスト的および視覚的制御が相反する場合には、画像制御が一貫して支配的であり、明確な制御階層と都市景観生成のための視覚制御のさらなる発展の重要性が示される。全体として、この研究は、SVIと拡散モデルを用いた街並み生成のための重要なベンチマークを確立し、都市シナリオ探索のための実用的でスケーラブルで制御可能なアプローチとして、生成AIがどのように機能するかを示している。

論文の概要: Designing streetscapes from street-view imagery using diffusion models

関連論文リスト