Fugu-MT 論文翻訳(概要): Retrieving Floods without Floodlights: Topic Models as Binary Classifiers for Extreme Climate Events in German News

論文の概要: Retrieving Floods without Floodlights: Topic Models as Binary Classifiers for Extreme Climate Events in German News

arxiv url: http://arxiv.org/abs/2605.03450v1
Date: Tue, 05 May 2026 07:32:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:43.820112
Title: Retrieving Floods without Floodlights: Topic Models as Binary Classifiers for Extreme Climate Events in German News
Title（参考訳）: 洪水のない洪水の回収--ドイツニュースにおける極端気候事象の2値分類器としてのトピックモデル
Authors: Brielen Madureira, Mariana Madruga de Brito, Andreas Niekler,
Abstract要約: われわれは、ドイツのメディアにおける7種類の極端な気候イベントに関する関連ニュースの検索を、トピックモデルを用いて改善している。提案手法は,トピックモデルにより推定された後続分布に依存し,関連する文書を選択する。我々は,NLPタスクにおいて,気候事象を単一カテゴリとして考えることに反対する,ハザード依存的な結果を示す。
参考スコア（独自算出の注目度）: 5.033722555649178
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In studies of media coverage of extreme climate events, NLP methods have become indispensable for identifying relevant texts in large news databases. Still, enough annotated data to train accurate deep learning-based classifiers from scratch is often not available. Topic Models have the advantage of being both unsupervised and interpretable, but are typically used only for exploratory analysis or data characterisation. In this study, we investigate how to employ Topic Models as binary classifiers for refining the retrieval of relevant news about seven types of extreme climate events in the German media. Our method relies on the posterior distributions estimated by Topic Models to select relevant documents, without modifying their training procedure. Using an annotated sample to guide the evaluation, we show that the probabilities assigned to keywords used to query news databases can also be informative for selecting relevant topics and improve sample precision. We compare our results to a fine-tuned text embedding classifier and an open-weight LLM, discussing observed trade-offs, e.g. the LLM's lowest precision. Moreover, we show that results are hazard-dependent, which speaks against considering climate events as a single category in NLP tasks.
Abstract（参考訳）: 極度の気候事象のメディア報道研究において、NLP法は大規模ニュースデータベースで関連するテキストを特定するのに欠かせないものとなっている。それでも、正確なディープラーニングベースの分類器をスクラッチから訓練するのに十分な注釈付きデータは入手できないことが多い。トピックモデルは教師なしと解釈可能な両方の利点があるが、典型的には探索分析やデータの特徴付けにのみ使用される。本研究では,ドイツのメディアにおいて,7種類の極寒事象に関する関連ニュースの検索を精査するために,トピックモデルをバイナリ分類器として利用する方法について検討した。提案手法は,学習手順を変更することなく,トピックモデルが推定した後部分布に基づいて関連文書を選択する。注釈付きサンプルを用いて、ニュースデータベースに問い合わせるキーワードに割り当てられた確率は、関連するトピックの選択やサンプルの精度の向上に役立てることができることを示す。実験の結果を,細調整のテキスト埋め込み分類器とオープンウェイト LLM と比較し,観測されたトレードオフ,例えば LLM の最小精度について議論した。さらに,NLPタスクにおいて,気候事象を単一カテゴリとして考えることに反対する,ハザードに依存した結果が示された。

論文の概要: Retrieving Floods without Floodlights: Topic Models as Binary Classifiers for Extreme Climate Events in German News

関連論文リスト