Fugu-MT 論文翻訳(概要): Sat2City v2: Native 3D City Asset Generation from a Single Satellite Image

論文の概要: Sat2City v2: Native 3D City Asset Generation from a Single Satellite Image

arxiv url: http://arxiv.org/abs/2606.24138v1
Date: Tue, 23 Jun 2026 04:46:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.765102
Title: Sat2City v2: Native 3D City Asset Generation from a Single Satellite Image
Title（参考訳）: Sat2City v2: 1つの衛星画像から3Dシティのネイティブアセット生成
Authors: Tongyan Hua, Dongli Wu, Jinjing Zhu, Yinrui Ren, Zhongcheng Hong, Ying-Cong Chen, Hui Xiong, Wufan Zhao,
Abstract要約: Sat2City v2は、単一の衛星画像から都市資産を明示的な3Dレンダリングするフレームワークである。 Sat2City v2ベンチマークで最高のパフォーマンスを実現するフレームワークを構築します。全体として、私たちのタスクは3Dレンダリングを、私たちの知識を最大限に活用するために、明示的なテクスチャ化されたメッシュアセットに達成します。
参考スコア（独自算出の注目度）: 39.47225879331284
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating explicit 3D city assets from a single satellite image is important for digital twins, urban simulation, and geospatial intelligence. Unlike satellite-to-street-view synthesis, the task requires a reusable textured mesh with plausible geometry and controllable appearance rather than a 3D proxy optimized only for rendering a small set of images or videos. The ICCV Sat2City framework made a first step by conditioning cascaded sparse-voxel latent diffusion on satellite-derived height maps, but its appearance was random, its training data were synthetic, and its task-specific VAE did not scale well to noisy real-world reconstructions. We present Sat2City v2, a journal extension that adapts a pretrained native structured-latent 3D foundation model to weakly aligned satellite images and textured meshes. We build a real-world dataset with 16,241 satellite-mesh pairs across 24 regions in 9 cities. Instead of learning a 3D representation from noisy city meshes, Sat2City v2 encodes each mesh into a pretrained native 3D latent space, fine-tunes a satellite-conditioned geometry flow, and uses the decoded shape to anchor satellite-conditioned texturing. This retains Sat2City's geometry-to-appearance cascade while enabling appearance-controllable generation from the satellite input. Experiments on metric-scale DSM reconstruction and generative city-asset benchmarks for geometry and appearance show that Sat2City v2 achieves the best overall performance among evaluated baselines. Overall, Sat2City v2 advances satellite-to-city generation from rendering-oriented 3D proxies to explicit textured mesh assets, supported by, to the best of our knowledge, the first documented satellite-mesh paired dataset collected from matched geographic crops for this asset-level task. Project page: https://ai4city-hkust.github.io/Sat2City-v2/
Abstract（参考訳）: 単一の衛星画像から明示的な3D都市資産を生成することは、デジタル双生児、都市シミュレーション、地理空間知能にとって重要である。衛星間ストリートビュー合成とは異なり、このタスクは、小さな画像やビデオのレンダリングだけに最適化された3Dプロキシではなく、可視形状と制御可能な外観を備えた再利用可能なテクスチャメッシュを必要とする。 ICCV Sat2Cityフレームワークは、衛星高度マップ上でカスケードされたスパース・ボクセルの潜伏拡散を条件付けることで第一歩を踏み出したが、その外観はランダムであり、そのトレーニングデータは合成され、タスク固有のVAEは、ノイズの多い現実世界の再構築にはうまくスケールしなかった。本稿では,Sat2City v2について述べる。Sat2City v2,Sat2City v2,Sat2City,Sat2City,Sat2City,Sat2City,Sat2City,Sat2City,Sat2City,Sat2City,Sat2City,Sat2City,Sat2Cit y,Sat2C,Sat2C,Sat2C,Sat2C,Sat2の3Dの3Dの3D基盤モデルについて述べる。 9都市24リージョンに16,241台の衛星メッシュペアで実世界のデータセットを構築します。ノイズの多い都市メッシュから3D表現を学ぶ代わりに、Sat2City v2は、各メッシュを事前訓練されたネイティブな3D潜伏空間にエンコードし、衛星条件の幾何学的流れを微調整し、デコードされた形状を使って衛星条件のテクスチャを固定する。これはサット2シティの幾何学から外観へのカスケードを保持し、衛星入力から外観制御可能な生成を可能にする。 Sat2City v2が評価ベースラインの中で最高の総合性能を達成していることを示す。全体として、Sat2City v2はレンダリング指向の3Dプロキシから明示的なテクスチャメッシュアセットまで、私たちの知る限り、このアセットレベルのタスクのために、マッチングされた地理的作物から収集された最初の文書化された衛星メッシュペアデータセットまで、衛星から都市への生成を進めています。プロジェクトページ:https://ai4city-hkust.github.io/Sat2City-v2/

論文の概要: Sat2City v2: Native 3D City Asset Generation from a Single Satellite Image

関連論文リスト