Fugu-MT 論文翻訳(概要): DamageArbiter: A CLIP-Enhanced Multimodal Arbitration Framework for Hurricane Damage Assessment from Street-View Imagery

論文の概要: DamageArbiter: A CLIP-Enhanced Multimodal Arbitration Framework for Hurricane Damage Assessment from Street-View Imagery

arxiv url: http://arxiv.org/abs/2603.14837v1
Date: Mon, 16 Mar 2026 05:32:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:36.068141
Title: DamageArbiter: A CLIP-Enhanced Multimodal Arbitration Framework for Hurricane Damage Assessment from Street-View Imagery
Title（参考訳）: damageArbiter:ストリートビュー画像によるハリケーン被害評価のためのCLIP強化マルチモーダルアロケーションフレームワーク
Authors: Yifan Yang, Lei Zou, Wenjing Gong, Kani Fu, Zongrong Li, Siqin Wang, Bing Zhou, Heng Cai, Hao Tian,
Abstract要約: 本研究では,CLIP(Contrastive Language- Image Pre-Training)モデルを用いたマルチモーダル不一致駆動のアロケーションフレームワークであるAussmentArbiterを提案する。 damageArbiterleverages the complementary strengths of unimodal and multimodal models, using a lightweight logistic regression meta-classifier to arbitrate case of disagreement。損傷アービターは74.33%(ViT-B/32、画像のみ)の精度を82.79%に改善し、80%の精度閾値を超え、最強のベースラインモデルと比べて8.46%の絶対的な改善を実現した。
参考スコア（独自算出の注目度）: 12.916687638980008
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Analyzing street-view imagery with computer vision models for rapid, hyperlocal damage assessment is becoming popular and valuable in emergency response and recovery, but traditional models often act like black boxes, lacking interpretability and reliability. This study proposes a multimodal disagreement-driven Arbitration framework powered by Contrastive Language-Image Pre-training (CLIP) models, DamageArbiter, to improve the accuracy, interpretability, and robustness of damage estimation from street-view imagery. DamageArbiter leverages the complementary strengths of unimodal and multimodal models, employing a lightweight logistic regression meta-classifier to arbitrate cases of disagreement. Using 2,556 post-disaster street-view images, paired with both manually generated and large language model (LLM)-generated text descriptions, we systematically compared the performance of unimodal models (including image-only and text-only models), multimodal CLIP-based models, and DamageArbiter. Notably, DamageArbiter improved the accuracy from 74.33% (ViT-B/32, image-only) to 82.79%, surpassing the 80% accuracy threshold and achieving an absolute improvement of 8.46% compared to the strongest baseline model. Beyond improvements in overall accuracy, compared to visual models relying solely on images, DamageArbiter, through arbitration of discrepancies between unimodal and multimodal predictions, mitigates common overconfidence errors in visual models, especially in situations where disaster visual cues are ambiguous or subject to interference, reducing overconfidence but incorrect predictions. We further mapped and analyzed geo-referenced predictions and misclassifications to compare model performance across locations. Overall, this work advances street-view-based disaster assessment from coarse severity classification toward a more reliable and interpretable framework.
Abstract（参考訳）: 高速で局所的な損傷評価のためのコンピュータビジョンモデルを用いたストリートビュー画像の解析は、緊急対応と回復に人気があり、価値のあるものになっているが、伝統的なモデルは、しばしばブラックボックスのように振る舞うが、解釈可能性や信頼性に欠ける。本研究では,ストリートビュー画像からの損傷推定の精度,解釈可能性,堅牢性を改善するために,CLIPモデルを用いたマルチモーダル不一致駆動アロケーションフレームワークを提案する。 damageArbiterは、不一致の事例を仲裁するために軽量なロジスティック回帰メタ分類器を用いて、単調モデルとマルチモーダルモデルの相補的な強みを利用する。 2,556個のストリートビュー画像を用いて,手動で生成した言語モデルと大規模言語モデル(LLM)の生成したテキスト記述を組み合わせ,画像のみとテキストのみのモデルを含む),マルチモーダルCLIPモデル,障害Arbiterの性能を体系的に比較した。特に、ダメージアービターは74.33%(ViT-B/32、画像のみ)の精度を82.79%に改善し、80%の精度閾値を超え、最強のベースラインモデルと比べて8.46%の絶対的な改善を実現した。画像のみに依存する視覚モデルに比べて、全体的な精度の改善に加えて、ダメージアービターは、単調な予測とマルチモーダルな予測の相違を仲裁することで、視覚モデルにおける一般的な過信エラーを緩和する。さらに、地理的参照予測と誤分類をマッピングし、分析し、地域間でのモデル性能を比較した。本研究は,大雑把な重度分類から,より信頼性が高く解釈可能な枠組みへと,ストリートビューに基づく災害評価を推し進める。

論文の概要: DamageArbiter: A CLIP-Enhanced Multimodal Arbitration Framework for Hurricane Damage Assessment from Street-View Imagery

関連論文リスト