Fugu-MT 論文翻訳(概要): M-QUEST -- Meme Question-Understanding Evaluation on Semantics and Toxicity

論文の概要: M-QUEST -- Meme Question-Understanding Evaluation on Semantics and Toxicity

arxiv url: http://arxiv.org/abs/2603.03315v1
Date: Mon, 09 Feb 2026 16:56:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 01:20:08.143021
Title: M-QUEST -- Meme Question-Understanding Evaluation on Semantics and Toxicity
Title（参考訳）: M-QUEST -- セマンティックスと毒性に関するミーム質問-
Authors: Stefano De Giorgis, Ting-Chih Chen, Filip Ilievski,
Abstract要約: 本稿では,ミームからの自動知識抽出のためのセマンティックフレームワークとそれに対応するベンチマークを提案する。このフレームワークは、meme toxicityアセスメントに関する常識的な質問と回答のペアでベンチマークを生成する半自動プロセスのガイドである。結果のベンチマークM-QUESTは307のミームに対して609の質問応答ペアで構成されている。
参考スコア（独自算出の注目度）: 10.944605467795848
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Internet memes are a powerful form of online communication, yet their nature and reliance on commonsense knowledge make toxicity detection challenging. Identifying key features for meme interpretation and understanding, is a crucial task. Previous work has been focused on some elements contributing to the meaning, such as the Textual dimension via OCR, the Visual dimension via object recognition, upper layers of meaning like the Emotional dimension, Toxicity detection via proxy variables, such as hate speech detection, and sentiment analysis. Nevertheless, there is still a lack of an overall architecture able to formally identify elements contributing to the meaning of a meme, and be used in the sense-making process. In this work, we present a semantic framework and a corresponding benchmark for automatic knowledge extraction from memes. First, we identify the necessary dimensions to understand and interpret a meme: Textual material, Visual material, Scene, Background Knowledge, Emotion, Semiotic Projection, Analogical Mapping, Overall Intent, Target Community, and Toxicity Assessment. Second, the framework guides a semi-automatic process of generating a benchmark with commonsense question-answer pairs about meme toxicity assessment and its underlying reason. The resulting benchmark M-QUEST consists of 609 question-answer pairs for 307 memes. Thirdly, we evaluate eight open-source large language models on their ability to correctly solve M-QUEST. Our results show that current models' commonsense reasoning capabilities for toxic meme interpretation vary depending on the dimension and architecture. Models with instruction tuning and reasoning capabilities significantly outperform the others, though pragmatic inference questions remain challenging. We release code, benchmark, and prompts to support future research intersecting multimodal content safety and commonsense reasoning.
Abstract（参考訳）: インターネットミームは、オンラインコミュニケーションの強力な形態であるが、その性質とコモンセンス知識への依存は、毒性の検出を困難にしている。ミームの解釈と理解のための重要な特徴を特定することは、重要なタスクです。これまでの研究は、OCRによるテクスト次元、オブジェクト認識による視覚次元、感情次元のような意味の上位層、ヘイトスピーチ検出のようなプロキシ変数による毒性検出、感情分析など、意味に寄与するいくつかの要素に焦点を当ててきた。それでも、ミームの意味に寄与する要素を正式に識別し、センスメイキングプロセスで使用できる全体的なアーキテクチャがまだ存在しない。本研究では,ミームからの自動知識抽出のためのセマンティック・フレームワークとそれに対応するベンチマークを提案する。まず,テキスト素材,ビジュアルマテリアル,シーン,背景知識,感情,セメティック投影,アナロジカルマッピング,総合インテント,ターゲットコミュニティ,トキシシティアセスメントなど,ミームの理解と解釈に必要な次元を同定する。第二に、このフレームワークは、ミーム毒性の評価とその根底にある理由について、常識的な質問応答ペアでベンチマークを生成する半自動プロセスのガイドとなる。結果のベンチマークM-QUESTは307のミームに対して609の質問応答ペアで構成されている。第3に、M-QUESTを正しく解く能力に基づいて、オープンソースの8つの大言語モデルを評価する。この結果から,現行モデルにおける有毒なミーム解釈のための常識推論能力は,寸法や構造によって異なることが明らかとなった。インストラクションチューニングと推論能力を持つモデルは、実用的推論の問題は依然として難しいが、他のモデルよりも大幅に優れている。我々は、マルチモーダルコンテンツ安全性と常識推論を交差する将来の研究を支援するために、コード、ベンチマーク、およびプロンプトをリリースする。

論文の概要: M-QUEST -- Meme Question-Understanding Evaluation on Semantics and Toxicity

関連論文リスト