Fugu-MT 論文翻訳(概要): T-IMPACT: A Severity-Aware Benchmark for Contextual Image-Text Manipulation

論文の概要: T-IMPACT: A Severity-Aware Benchmark for Contextual Image-Text Manipulation

arxiv url: http://arxiv.org/abs/2606.22339v1
Date: Sun, 21 Jun 2026 05:07:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-25 18:53:58.33972
Title: T-IMPACT: A Severity-Aware Benchmark for Contextual Image-Text Manipulation
Title（参考訳）: T-IMPACT: 文脈画像テキスト操作のための重大度対応ベンチマーク
Authors: Gagandeep Singh, Aaditya Yadav, Priyanka Singh,
Abstract要約: 我々は、ニューススタイルの画像テキストペアを操作するための、第1リリースの重大度対応ベンチマークであるT-contextを紹介する。 T-には、プリズム、画像のみ、テキストのみ、関節操作にまたがる98,786のサンプルが含まれている。パイプラインはセマンティックアンカーを抽出し、空間的に接地し、ローカライズされた画像編集と制約付きキャプション書き換えを行う。
参考スコア（独自算出の注目度）: 9.049034101566642
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent advances in vision-language models and generative editing systems have made it increasingly easy to produce persuasive multimodal misinformation by altering images, text, or both jointly. However, existing datasets focus mainly on authenticity, out-of-context mismatch, or manipulation type, and rarely capture how strongly an edit changes the likely interpretation of a post. We introduce T-IMPACT, a first-release severity-aware benchmark for manipulated news-style image-text pairs. T-IMPACT contains 98,786 examples spanning pristine, image-only, text-only, and joint manipulations, with a calibrated continuous severity signal, coarse low/medium/high labels, and supporting grounding metadata. Starting from a news image-text pair, the pipeline extracts semantic anchors, grounds them spatially, performs localized image edits and constrained caption rewrites, and calibrates contextual-impact scores using limited human ratings. In this release, the calibrated continuous score is the primary severity target, while the low/medium/high bands should be interpreted as coarse operating buckets rather than balanced classes. Experiments show that current models recover some authenticity signal, but severity prediction remains substantially harder and only weakly aligned with human judgment. T-IMPACT provides an initial benchmark for studying multimodal manipulation beyond binary real/fake classification toward graded contextual impact.
Abstract（参考訳）: 近年の視覚言語モデルと生成編集システムの進歩により、画像、テキスト、あるいは両方を共同で変更することで、説得力のあるマルチモーダルの誤情報を生成することがますます容易になっている。しかし、既存のデータセットは主に信頼度、アウトオブコンテキストのミスマッチ、操作タイプに重点を置いており、編集がポストの潜在的な解釈をどれほど強く変更するかをキャプチャすることは滅多にない。我々は、ニューススタイルの画像テキストペアを操作するための、第1リリースのSeverity-awareベンチマークであるT-IMPACTを紹介する。 T-IMPACTは、プリズム、イメージオンリー、テキストオンリー、関節操作にまたがる98,786のサンプルを含み、校正された連続重度信号、粗いロー/メジウム/ハイラベル、グラウンドリングメタデータをサポートする。ニュース画像とテキストのペアから、パイプラインはセマンティックアンカーを抽出し、空間的に接地し、局所的な画像編集と制約付きキャプションの書き直しを行い、人間格付けによる文脈的影響スコアを校正する。このリリースでは、校正された連続スコアが主要な重大目標であり、低/中/高帯域はバランスの取れたクラスではなく粗い操作バケットとして解釈されるべきである。実験では、現在のモデルではいくつかの真正性信号が復元されているが、重大度予測は極めて困難であり、人間の判断と弱く一致している。 T-IMPACTは、段階的文脈影響に対するバイナリ実/偽分類以外のマルチモーダル操作を研究するための初期ベンチマークを提供する。

論文の概要: T-IMPACT: A Severity-Aware Benchmark for Contextual Image-Text Manipulation

関連論文リスト