Fugu-MT 論文翻訳(概要): Beating the Style Detector: Three Hours of Agentic Research on the AI-Text Arms Race

論文の概要: Beating the Style Detector: Three Hours of Agentic Research on the AI-Text Arms Race

arxiv url: http://arxiv.org/abs/2605.02620v1
Date: Mon, 04 May 2026 14:10:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:50.32286
Title: Beating the Style Detector: Three Hours of Agentic Research on the AI-Text Arms Race
Title（参考訳）: AI・テキスト・アームズ・レースのエージェント・リサーチの3時間
Authors: Andreas Maier, Moritz Zaiss, Siming Bayer,
Abstract要約: 実験的なNLP研究を再現するには数週間を要した。全コード、648ドル(約6,800円)の原案、訓練された検出器、診断、および敵の軌道がリリースされている。
参考スコア（独自算出の注目度）: 3.9508043303559828
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reproducing an empirical NLP study used to take weeks. Given the released data and a modern agentic-research harness, we redo every experiment of a recent ACL\,2026 study on personal-style post-editing of LLM drafts -- and add three new ones -- with the human investigator acting only as a reviewer-in-the-loop. We reproduce all seven preregistered hypotheses and recover the paper's headline correlation between perceived self-similarity and embedding-measured self-similarity to three decimal places ($r{=}{+}0.244$, $p{<}10^{-8}$, $n{=}648$). Under a leakage-free held-out protocol, GPT-5.5 and Claude\,Opus\,4.7 close $71$--$75\,\%$ of the style gap to the same-author ceiling on $324$ paired tasks, against $24\,\%$ for the human post-edit, and beat the human post-edit on $\sim$$80\,\%$ of tasks. We then frame the same data as an AI-text detection arms race. A leave-authors-out linear SVM on LUAR-MUD embeddings reaches AUC $0.93$--$1.00$ across approaches; six diagnostics show that GPT-5.5 detection is mostly a length confound while Opus detection is a genuine stylistic signature. Given $T{=}20$ feedback iterations against the frozen detector, an Opus agent flips two of five held-out test mimics to the human half-space and shrinks every margin by an order of magnitude. With moderate effort against a known detector, a frontier LLM can already efficiently lower its own AI-detection probability. All code, $648$ mimic drafts, trained detectors, diagnostics, and adversarial trajectories are released.
Abstract（参考訳）: 実験的なNLP研究を再現するには数週間を要した。公表されたデータと現代のエージェント・リサーチ・ハーネスを踏まえると、私たちは最近のALC\,2026の研究を全て再検討し、LLMドラフトの個人スタイルのポスト編集を行い、人間調査官がレビュー・イン・ザ・ループとしてのみ行動する新しい3つのものを追加しました。予備登録された7つの仮説を全て再現し、認識された自己相似性と埋め込み測定された自己相似性の間の紙の見出し関係を3つの十進の場所(r{=}{+}0.244$, $p{<}10^{-8}$, $n{=}648$)に再現する。 GPT-5.5 と Claude\,4.7 はリークフリーのホールドアウトプロトコルの下で、$1$--75\,\%$と$24$のペアタスクで同じ著者の天井のスタイルギャップを、$24$のポストエジットで$24\,\%$と、$80\,\%のタスクで人間のポストエジットを打ち負かした。そして、AIテキスト検出アームレースと同じデータをフレーム化する。 LUAR-MUD埋め込み上の左書きの線形SVMは、アプローチ間のAUC$0.93$--1.00$に達する。凍った検出器に対してT{=}20$のフィードバックが与えられたとき、Opusのエージェントは5つのうち2つを人間のハーフスペースに反転させ、すべてのマージンを桁違いに縮める。既知の検出器に対する適度な努力により、フロンティアLSMは、自身のAI検出確率を効率的に低下させることができる。全コード、648ドル(約6,800円)の原案、訓練された検出器、診断、および敵の軌道がリリースされている。

論文の概要: Beating the Style Detector: Three Hours of Agentic Research on the AI-Text Arms Race

関連論文リスト