Fugu-MT 論文翻訳(概要): Multimodal Urban Tree Detection from Satellite and Street-Level Imagery via Annotation-Efficient Deep Learning Strategies

論文の概要: Multimodal Urban Tree Detection from Satellite and Street-Level Imagery via Annotation-Efficient Deep Learning Strategies

arxiv url: http://arxiv.org/abs/2604.03505v1
Date: Fri, 03 Apr 2026 23:05:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.614489
Title: Multimodal Urban Tree Detection from Satellite and Street-Level Imagery via Annotation-Efficient Deep Learning Strategies
Title（参考訳）: 注釈効率のよい深層学習手法による衛星画像とストリートレベル画像からのマルチモーダル都市木検出
Authors: In Seon Kim, Ali Moghimi,
Abstract要約: 本研究では,高解像度衛星画像と地上のGoogleストリートビューを統合し,スケーラブルで詳細な都市木検出を可能にするフレームワークを提案する。このフレームワークはまず、衛星画像を利用してツリー候補をローカライズし、ターゲットとなる地上レベルのビューを取得して詳細な検出を行う。アノテーションのボトルネックに対処するために、既存の注釈付きデータセットから新しい関心領域への知識の転送にドメイン適応を用いる。
参考スコア（独自算出の注目度）: 0.7734726150561086
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Beyond the immediate biophysical benefits, urban trees play a foundational role in environmental sustainability and disaster mitigation. Precise mapping of urban trees is essential for environmental monitoring, post-disaster assessment, and strengthening policy. However, the transition from traditional, labor-intensive field surveys to scalable automated systems remains limited by high annotation costs and poor generalization across diverse urban scenarios. This study introduces a multimodal framework that integrates high-resolution satellite imagery with ground-level Google Street View to enable scalable and detailed urban tree detection under limited-annotation conditions. The framework first leverages satellite imagery to localize tree candidates and then retrieves targeted ground-level views for detailed detection, significantly reducing inefficient street-level sampling. To address the annotation bottleneck, domain adaptation is used to transfer knowledge from an existing annotated dataset to a new region of interest. To further minimize human effort, we evaluated three learning strategies: semi-supervised learning, active learning, and a hybrid approach combining both, using a transformer-based detection model. The hybrid strategy achieved the best performance with an F1-score of 0.90, representing a 12% improvement over the baseline model. In contrast, semi-supervised learning exhibited progressive performance degradation due to confirmation bias in pseudo-labeling, while active learning steadily improved results through targeted human intervention to label uncertain or incorrect predictions. Error analysis further showed that active and hybrid strategies reduced both false positives and false negatives. Our findings highlight the importance of a multimodal approach and guided annotation for scalable, annotation-efficient urban tree mapping to strengthen sustainable city planning.
Abstract（参考訳）: 直近の生物物理学的利益の他に、都市木は環境の持続可能性や災害の軽減に基礎的な役割を果たしている。都市樹の精密マッピングは, 環境モニタリング, 災害後の評価, 政策強化に不可欠である。しかし、従来の労働集約型フィールドサーベイからスケーラブルな自動化システムへの移行は、様々な都市シナリオにおける高いアノテーションコストと低い一般化によって制限されている。本研究では,高解像度衛星画像と地上のGoogleストリートビューを統合したマルチモーダルフレームワークを導入し,限定アノテーション条件下でのスケーラブルで詳細な都市木検出を実現する。このフレームワークは、まず衛星画像を利用してツリー候補をローカライズし、ターゲットとなる地上レベルのビューを詳細な検出のために検索し、非効率な街路レベルのサンプリングを著しく削減する。アノテーションのボトルネックに対処するために、既存の注釈付きデータセットから新しい関心領域への知識の転送にドメイン適応を用いる。人間の努力を最小化するために,半教師付き学習,能動学習,ハイブリッドアプローチの3つの学習戦略を,トランスフォーマーに基づく検出モデルを用いて評価した。ハイブリッド戦略はF1スコア0.90で最高の性能を達成し、ベースラインモデルよりも12%改善した。これとは対照的に,半教師付き学習では疑似ラベルの確認バイアスによる進行的性能劣化がみられ,アクティブラーニングでは,不確実または誤予測のラベル付けによる人為的介入による結果が着実に改善された。さらに, 誤り分析により, 偽陽性と偽陰性の両方を減少させる活性およびハイブリッド戦略が示された。本研究は, 持続的都市計画を強化するため, 拡張性, アノテーション効率のよい都市樹図作成のためのマルチモーダルアプローチとガイド付きアノテーションの重要性を強調した。

論文の概要: Multimodal Urban Tree Detection from Satellite and Street-Level Imagery via Annotation-Efficient Deep Learning Strategies

関連論文リスト