Fugu-MT 論文翻訳(概要): TextFake: Benchmarking AI-Generated Image Detection on Text-Rich Images

論文の概要: TextFake: Benchmarking AI-Generated Image Detection on Text-Rich Images

arxiv url: http://arxiv.org/abs/2606.01050v1
Date: Sun, 31 May 2026 06:42:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:29.173568
Title: TextFake: Benchmarking AI-Generated Image Detection on Text-Rich Images
Title（参考訳）: TextFake: テキストリッチイメージによるAI生成画像のベンチマーク
Authors: Yuning Zhang, Changtao Miao, Mingyu Liao, Tingyu Liu, Xinghao Wang, Tao Gong, Qi Chu, Nenghai Yu,
Abstract要約: TextFakeは、28言語にわたるテキストリッチAIGI検出のための20,000イメージのベンチマークである。フェイクイメージは、実際の画像を3つの制御された次元に沿ってアノテートする4段階のパイプラインを介して合成される。 80%を超えるメソッドはなく、一部は自然画像のベンチマークから60%以上落ちている。
参考スコア（独自算出の注目度）: 45.701818427706684
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent AI-generated image (AIGI) detectors perform well on natural-image benchmarks, but their behavior on text-rich forgeries, such as fabricated screenshots, documents, and news pages prevalent in misinformation, remains untested. We introduce TextFake, a 20,000-image benchmark for text-rich AIGI detection spanning 28 languages, 4 topic categories, and 2 scene modalities. Fake images are synthesized via a four-stage pipeline that annotates real images along three controlled dimensions and generates counterparts through distribution-aligned structured prompting, ruling out covariate shortcuts. Zero-shot evaluation of 14 specialized detectors and 3 frontier VLM APIs reveals a large systematic gap: no method exceeds 80% accuracy, with some dropping over 60% from natural-image benchmarks. Diagnostic evaluations identify three failure modes: the Text Density Curse, where dense glyphs overwhelm low-level detectors; Cloaking via Rendering Fidelity, where stronger text rendering suppresses enerative artifacts; and Threshold Collapse, where routine perturbations drive detectors toward chance-level performance.
Abstract（参考訳）: 最近のAIGI検出器は、自然画像のベンチマークでよく機能するが、偽造されたスクリーンショット、文書、誤情報でよく見られるニュースページなど、テキストに富んだ偽造物に対する振る舞いは、まだ検証されていない。テキストリッチAIGI検出のための2万イメージのベンチマークであるTextFakeを紹介した。フェイクイメージは、実画像を3つの制御された次元に沿ってアノテートし、分布整列された構造化プロンプトを通じて、共変量ショートカットを除外する4段階のパイプラインを介して合成される。 14の特殊検出器と3つのフロンティアVLM APIのゼロショット評価は、大きな体系的なギャップを明らかにしている。診断評価では、高密度のグリフが低レベルの検出器を圧倒するテキスト密度曲線、より強いテキストレンダリングがエレクティブなアーティファクトを抑制するRendering Fidelityによるクローキング、定期的な摂動が検出器をチャンスレベルのパフォーマンスに導くThreshold Collapseの3つの障害モードが特定されている。

論文の概要: TextFake: Benchmarking AI-Generated Image Detection on Text-Rich Images

関連論文リスト