Fugu-MT 論文翻訳(概要): LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

論文の概要: LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

arxiv url: http://arxiv.org/abs/2605.08448v1
Date: Fri, 08 May 2026 20:15:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:49.652784
Title: LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification
Title（参考訳）: ソーシャルメディア危機データ分類のためのLLM誘導半監督的アプローチ
Authors: Jacob Ativo, Bharaneeshwar Balasubramaniyam, Anh Tran, Khushboo Gupta, Hongmin Li, Doina Caragea, Cornelia Caragea,
Abstract要約: 本稿では,大規模言語モデル(LLM)による危機関連つぶやき分類のための半教師付き学習の実証評価を行った。以上の結果から,LG-CoTrainはリソース設定の低さにおいて,従来の半教師付きアプローチよりも優れていた。コンパクトな半教師付きモデルは、場合によっては、ゼロショット設定で動作する非常に大きなLLMよりも優れている。
参考スコア（独自算出の注目度）: 43.732252043913284
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Semi-supervised learning approaches have been investigated as a means to enhance the analysis of social media data in disaster management contexts. In this work, we present the first empirical evaluation of large language model (LLM) guided semi-supervised learning for crisis related tweet classification. We compare two recent LLM assisted semi-supervised methods, VerifyMatch and LLM guided Co-Training ( LG-CoTrain), against established semi-supervised baselines. Our results show that LG-CoTrain significantly outperforms classical semi-supervised approaches in low resource settings with 5, 10 and 25 labeled examples per class, achieving the highest averaged Macro F1 across events. VerifyMatch achieves competitive performance while also demonstrating strong calibration properties. As the number of labeled examples increases, the performance gap narrows and Self Training emerges as a strong baseline. We further observe that compact semi-supervised models can, in some cases, outperform very large LLMs operating in zero-shot settings. This finding highlights the potential of transferring knowledge from LLMs into smaller and more deployable models through LLM guided semi-supervised learning, offering a practical pathway for real world disaster response applications. Our project repository on Github is here.
Abstract（参考訳）: 半教師付き学習アプローチは,災害管理におけるソーシャルメディアデータの分析を強化する手段として研究されている。本研究では,大規模言語モデル(LLM)による危機関連ツイート分類のための半教師付き学習の実証評価を行った。我々は,最近のLLM支援半教師付き手法であるVerifyMatchとLLMガイド付きコトレーニング(LG-CoTrain)を,既存の半教師付きベースラインと比較した。以上の結果から,LG-CoTrainは,クラス毎の5,10,25のラベル付き例において,従来の半教師付きアプローチよりも有意に優れており,イベントごとの平均値であるマクロF1を達成できた。 VerifyMatchは、強力なキャリブレーション特性を示しながら、競争性能を達成する。ラベル付きサンプルの数が増えるにつれて、パフォーマンスギャップが狭まり、セルフトレーニングが強力なベースラインとして現れます。さらに、コンパクトな半教師付きモデルでは、ゼロショット設定で動作する非常に大きなLLMよりも優れた性能が得られることを観察する。この発見は、LLMからより小型でデプロイ可能なモデルに、LLMガイド付き半教師付き学習を通じて知識を移行する可能性を強調し、現実の災害対応アプリケーションのための実践的な経路を提供する。 Githubのプロジェクトリポジトリはこちらです。

論文の概要: LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

関連論文リスト