Fugu-MT 論文翻訳(概要): ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection

論文の概要: ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection

arxiv url: http://arxiv.org/abs/2510.08630v1
Date: Wed, 08 Oct 2025 13:12:06 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 00:38:47.220156
Title: ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection
Title（参考訳）: ExPO-HM:Hateful Meme DetectionのためのExplain-then-Detectの学習
Authors: Jingbiao Mei, Mingsheng Sun, Jinghong Chen, Pengda Qin, Yuhong Li, Da Chen, Bill Byrne,
Abstract要約: 有害なミームは、オンライン虐待の特に困難な形態として現れ、自動検知システムの開発を動機付けている。従来のアプローチのほとんどは直接検出に依存しており、バイナリ予測のみを生成する。 ExPO-HMは、SFTウォームアップとGRPOをカリキュラム学習と組み合わせ、条件決定エントロピー(CDE)を、推論品質の指標と報酬の両方として組み合わせている。
参考スコア（独自算出の注目度）: 29.000615125118127
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hateful memes have emerged as a particularly challenging form of online abuse, motivating the development of automated detection systems. Most prior approaches rely on direct detection, producing only binary predictions. Such models fail to provide the context and explanations that real-world moderation requires. Recent Explain-then-Detect approaches, using Chain-of-Thought prompting or LMM agents, perform worse than simple SFT baselines, and even advanced post-training methods such as GRPO fail to close the gap. Our analysis identifies two key issues of such systems: important policy-relevant cues such as targets and attack types are not hypothesized by the model as a likely explanation; and the binary reward signal is insufficient to guide reasoning. To address these challenges, we propose ExPO-HM (Explain-then-Detect Policy Optimization for Hateful Memes), inspired by the training and evaluation process of human annotators. ExPO-HM combines SFT warmup, GRPO with curriculum learning, and Conditional Decision Entropy (CDE) as both metric and reward for reasoning quality. Across three hateful meme benchmarks, ExPO-HM achieves state-of-the-art performance on binary detection, fine-grained classification, and reasoning quality, with up to 15\% and 17\% F1 improvement over the GRPO and DPO baselines, respectively. By moving hateful meme detection from simple binary alarms to explanation-driven detection, ExPO-HM provides accurate, interpretable, and actionable moderation support.
Abstract（参考訳）: 有害なミームは、オンライン虐待の特に困難な形態として現れ、自動検知システムの開発を動機付けている。従来のアプローチのほとんどは直接検出に依存しており、バイナリ予測のみを生成する。このようなモデルは、現実世界のモデレーションに必要なコンテキストや説明を提供するのに失敗する。近年の Explain-then-Detect approach, using Chain-of-Thought prompting or LMM agent, performed worse than simple SFT baselines, and even advanced post-training methods such as GRPO fail to close the gap。本分析では,ターゲットやアタックタイプなどの重要な政策関連手法がモデルによって仮説化されていないこと,二項報酬信号が推論の導出に不十分であること,の2つの問題を明らかにした。これらの課題に対処するために,人間のアノテータのトレーニングと評価プロセスに触発されたExPO-HM(Explain-then-Detect Policy Optimization for Hateful Memes)を提案する。 ExPO-HMは、SFTウォームアップとGRPOをカリキュラム学習と組み合わせ、条件決定エントロピー(CDE)を、推論品質の指標と報酬の両方として組み合わせている。 ExPO-HMは3つのヘイトフルミームベンチマークでバイナリ検出、きめ細かな分類、推論品質の最先端性能を達成し、GRPOベースラインとDPOベースラインに対して最大15\%と17\%のF1改善を実現している。単純なバイナリアラームから説明駆動検出に移行することで、ExPO-HMは正確な、解釈可能な、動作可能なモデレーションサポートを提供する。

論文の概要: ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection

関連論文リスト