Fugu-MT 論文翻訳(概要): PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks

論文の概要: PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks

arxiv url: http://arxiv.org/abs/2511.00416v1
Date: Sat, 01 Nov 2025 05:59:46 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:26.764582
Title: PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Title（参考訳）: PADBen: パラフレーズ攻撃に対するAIテキスト検出器評価のための総合ベンチマーク
Authors: Yiwei Zha, Rui Min, Shanu Sushmita,
Abstract要約: そこで本研究では,AIGT識別のために設計された検出システムに対して,繰り返しパラメタしたテキストが回避される理由について検討する。パラフレーズ攻撃シナリオに対する検出ロバスト性を系統的に評価した最初のベンチマークであるPADBenを紹介する。
参考スコア（独自算出の注目度）: 2.540711742769252
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While AI-generated text (AIGT) detectors achieve over 90\% accuracy on direct LLM outputs, they fail catastrophically against iteratively-paraphrased content. We investigate why iteratively-paraphrased text -- itself AI-generated -- evades detection systems designed for AIGT identification. Through intrinsic mechanism analysis, we reveal that iterative paraphrasing creates an intermediate laundering region characterized by semantic displacement with preserved generation patterns, which brings up two attack categories: paraphrasing human-authored text (authorship obfuscation) and paraphrasing LLM-generated text (plagiarism evasion). To address these vulnerabilities, we introduce PADBen, the first benchmark systematically evaluating detector robustness against both paraphrase attack scenarios. PADBen comprises a five-type text taxonomy capturing the full trajectory from original content to deeply laundered text, and five progressive detection tasks across sentence-pair and single-sentence challenges. We evaluate 11 state-of-the-art detectors, revealing critical asymmetry: detectors successfully identify the plagiarism evasion problem but fail for the case of authorship obfuscation. Our findings demonstrate that current detection approaches cannot effectively handle the intermediate laundering region, necessitating fundamental advances in detection architectures beyond existing semantic and stylistic discrimination methods. For detailed code implementation, please see https://github.com/JonathanZha47/PadBen-Paraphrase-Attack-Benchmark.
Abstract（参考訳）: AI生成テキスト(AIGT)検出器は直接LLM出力に対して90%以上の精度を達成するが、反復的に表現された内容に対して破滅的に失敗する。 AIGT識別のために設計された検出システムを回避するために、反復的に表現されたテキスト自体がAI生成である理由について検討する。内在的メカニズム分析により,反復的パラフレーズ化は,保存された生成パターンによる意味的変位を特徴とする中間的なラダーリング領域を生成することが明らかとなり,これは2つの攻撃カテゴリ – 人文のパラフレーズ化(オーサシップ難読化)とLLM生成テキストのパラフレーズ化(プラギアリズム回避) – が生じる。これらの脆弱性に対処するため、私たちは、両方のパラフレーズ攻撃シナリオに対する検出ロバスト性を体系的に評価する最初のベンチマークであるPADBenを紹介した。 PADBenは、オリジナルコンテンツから深く洗浄されたテキストへの完全な軌道をキャプチャする5種類のテキスト分類と、文対と単文の課題にまたがる5つのプログレッシブな検出タスクから構成される。我々は11個の最先端検出器を評価し、臨界非対称性を明らかにした。以上の結果から,現在の検知手法は中間洗浄領域を効果的に扱うことができず,既存の意味的・構造的識別手法を超えて,検出アーキテクチャの基本的な進歩が必要であることが示唆された。詳細なコード実装については、https://github.com/JonathanZha47/PadBen-Paraphrase-Attack-Benchmarkを参照してください。

論文の概要: PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks

関連論文リスト