Fugu-MT 論文翻訳(概要): Alignment Imprint: Zero-Shot AI-Generated Text Detection via Provable Preference Discrepancy

論文の概要: Alignment Imprint: Zero-Shot AI-Generated Text Detection via Provable Preference Discrepancy

arxiv url: http://arxiv.org/abs/2604.16923v1
Date: Sat, 18 Apr 2026 09:12:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.239498
Title: Alignment Imprint: Zero-Shot AI-Generated Text Detection via Provable Preference Discrepancy
Title（参考訳）: アライメントインプリント:確率的優先度差によるゼロショットAI生成テキスト検出
Authors: Junxi Wu, Kailin Huang, Dongjian Hu, Bin Chen, Hao Wu, Shu-Tao Xia, Changliang Zou,
Abstract要約: 現代のLarge Language Models (LLMs) がアライメントされ、測定可能な分布インプリントが残されていることを示す。高エントロピー領域における不安定性を軽減するため、ログライクなアライメント・アライメント・プレフレパシー(LAPD)を導入する。 LAPDはアライメントインプリントに基づく標準化された情報重み統計である。
参考スコア（独自算出の注目度）: 51.887915969023965
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Detecting AI-generated text is an important but challenging problem. Existing likelihood-based detection methods are often sensitive to content complexity and may exhibit unstable performance. In this paper, our key insight is that modern Large Language Models (LLMs) undergo alignment (including fine-tuning and preference tuning), leaving a measurable distributional imprint. We theoretically derive this imprint by abstracting the alignment process as a sequence of constrained optimization steps, showing that the log-likelihood ratio can naturally decompose into implicit instructional biases and preference rewards. We refer to this quantity as the Alignment Imprint. Furthermore, to mitigate the instability in high-entropy regions, we introduce Log-likelihood Alignment Preference Discrepancy (LAPD), a standardized information-weighted statistic based on alignment imprint. We provide statistical guarantee that alignment-based statistics dominate Fast-DetectGPT in performance. We also theoretically show that LAPD strictly improves the unweighted alignment scores when the aligned and base models are close in distribution. Extensive experiments show that LAPD achieves an improvement 45.82% relative to the strongest existing baselines, yielding large and consistent gains across all settings.
Abstract（参考訳）: AI生成テキストの検出は重要な問題ですが、難しい問題です。既存の可能性に基づく検出方法は、しばしば内容の複雑さに敏感であり、不安定な性能を示す。本稿では,現代大規模言語モデル (LLM) がアライメント(微調整, 選好調整を含む)を行い, 測定可能な分布インプリントを残している点について考察する。理論的には、アライメント過程を制約付き最適化ステップの列として抽象化し、対数類似度比が暗黙の命令バイアスと選好報酬に自然に分解できることを示し、このインプリントを導出する。これをアライメント・インプリント(Alignment Imprint)と呼ぶ。さらに、高エントロピー領域における不安定性を軽減するために、アライメントインプリントに基づく標準化された情報重み付き統計量である、ログライクなアライメント参照離散性(LAPD)を導入する。我々は,アライメントに基づく統計が高速デテクストGPTの性能において支配的であることを統計的に保証する。また、LAPDは、アライメントモデルとベースモデルが分布に近接している場合、非重み付きアライメントスコアを厳密に改善することを示す。大規模な実験により、LAPDは既存の最強のベースラインに対して45.82%の改善を達成し、全ての設定で大きく一貫した利得が得られることが示された。

論文の概要: Alignment Imprint: Zero-Shot AI-Generated Text Detection via Provable Preference Discrepancy

関連論文リスト