Fugu-MT 論文翻訳(概要): Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media

論文の概要: Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media

arxiv url: http://arxiv.org/abs/2603.18611v1
Date: Thu, 19 Mar 2026 08:31:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:06.031104
Title: Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media
Title（参考訳）: ソーシャルメディア上の説明可能な人道的分類のためのクロスモーダル・ライナリー・トランスファー
Authors: Thi Huyen Nguyen, Koustav Rudra, Wolfgang Nejdl,
Abstract要約: 本稿では,解釈可能なマルチモーダル分類フレームワークを提案する。提案手法はマクロF1の分類を2～35%向上させる。提案手法はゼロショットモードの新たな未知のデータセットによく適応し,80%の精度を実現している。
参考スコア（独自算出の注目度）: 8.788077041327773
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Advances in social media data dissemination enable the provision of real-time information during a crisis. The information comes from different classes, such as infrastructure damages, persons missing or stranded in the affected zone, etc. Existing methods attempted to classify text and images into various humanitarian categories, but their decision-making process remains largely opaque, which affects their deployment in real-life applications. Recent work has sought to improve transparency by extracting textual rationales from tweets to explain predicted classes. However, such explainable classification methods have mostly focused on text, rather than crisis-related images. In this paper, we propose an interpretable-by-design multimodal classification framework. Our method first learns the joint representation of text and image using a visual language transformer model and extracts text rationales. Next, it extracts the image rationales via the mapping with text rationales. Our approach demonstrates how to learn rationales in one modality from another through cross-modal rationale transfer, which saves annotation effort. Finally, tweets are classified based on extracted rationales. Experiments are conducted over CrisisMMD benchmark dataset, and results show that our proposed method boosts the classification Macro-F1 by 2-35% while extracting accurate text tokens and image patches as rationales. Human evaluation also supports the claim that our proposed method is able to retrieve better image rationale patches (12%) that help to identify humanitarian classes. Our method adapts well to new, unseen datasets in zero-shot mode, achieving an accuracy of 80%.
Abstract（参考訳）: ソーシャルメディアデータの普及により、危機時のリアルタイム情報の提供が可能になる。情報は、インフラストラクチャの損傷や、影響を受けたゾーンに行方不明または立ち往生している人など、さまざまなクラスから来ている。既存の手法では、テキストやイメージをさまざまな人道的カテゴリーに分類しようとしたが、その意思決定プロセスはほとんど不透明であり、実際のアプリケーションへの展開に影響を与える。最近の研究は、予測されたクラスを説明するためにツイートから文章の合理性を抽出することで透明性を向上させることを目指している。しかし、このような説明可能な分類法は、危機関連画像ではなく、主にテキストに焦点を当てている。本稿では,解釈可能なマルチモーダル分類フレームワークを提案する。本手法はまず,視覚言語トランスフォーマーモデルを用いてテキストと画像の結合表現を学習し,テキストの合理性を抽出する。次に、テキスト合理化を用いたマッピングにより、画像合理化を抽出する。提案手法は,1つのモダリティにおける有理を相互に有理変換によって学習する方法を示し,アノテーションの労力を節約する。最後に、ツイートは抽出された根拠に基づいて分類される。 CrisisMMDベンチマークデータセットを用いて実験を行い、提案手法により、正確なテキストトークンと画像パッチを合理的に抽出しながら、マクロF1の分類を2～35%向上することを示した。また,提案手法は,人道的クラスを特定する上で有効な,より優れた画像合理化パッチ(12%)を検索できるという主張も支持している。提案手法はゼロショットモードの新たな未知のデータセットによく適応し,80%の精度を実現している。

論文の概要: Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media

関連論文リスト