Fugu-MT 論文翻訳(概要): On the Salience of Low-Probability Tokens for AI-Generated Text Detection: A Multiscale Uncertainty Perspective

論文の概要: On the Salience of Low-Probability Tokens for AI-Generated Text Detection: A Multiscale Uncertainty Perspective

arxiv url: http://arxiv.org/abs/2606.02158v1
Date: Mon, 01 Jun 2026 12:21:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:31.990532
Title: On the Salience of Low-Probability Tokens for AI-Generated Text Detection: A Multiscale Uncertainty Perspective
Title（参考訳）: AIによるテキスト検出のための低確率トークンのサプライアンスについて:マルチスケール不確実性の観点から
Authors: Yikai Guo, Bin Wang, Xilai Fan, Wenjun Ke, Haoran Luo,
Abstract要約: 不確実性は情報的低確率トークンに焦点を当てたマルチスケール不確実性推定器である。低確率トークンのログ確率を平均化することで、ボイラープレートの優位性を緩和する。これはレニイエントロピーを通じて、この低確率領域の分布形状を捉え、脆さを減少させる。
参考スコア（独自算出の注目度）: 19.794084116453146
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI-generated text increasingly blends with human writing, raising practical risks such as misinformation, academic misuse, and corpora contamination. While statistical detectors are appealing for efficiency and generalization, they suffer from two key limitations. (i) Boilerplate dominance, boilerplate tokens shared across human and LLM writing can overwhelm discriminative signals. (ii) Brittle point estimates, relying on a single probability score yields unstable decisions under adversarial manipulations. To address these issues, we propose Uncertainty, a multiscale uncertainty estimator that focuses on informative low-probability tokens, which more clearly expose distributional discrepancies. Locally, it alleviates boilerplate dominance by averaging the log-probabilities of low-probability tokens; globally, it reduces brittleness by capturing the distributional shape of this low-probability region via Rényi entropy. We further extend the detector to Uncertainty++ via conditional independent sampling, yielding a more stable uncertainty estimation. Experiments across seven datasets and sixteen LLMs demonstrate high effectiveness, generalization, and robustness. Our code is available at https://github.com/guoyikai2000/Uncertainty-AIGT.
Abstract（参考訳）: AI生成されたテキストは、人間の文章と混同され、誤情報、学術的誤用、コーパス汚染などの実践的なリスクが高まる。統計検出器は効率と一般化をアピールしているが、それらは2つの重要な限界に悩まされている。一ボイラープレートのトークンを人体とLLMで共有することにより、識別シグナルを圧倒することができる。 2) 1つの確率スコアに依存する脆点推定は、敵の操作の下で不安定な決定をもたらす。これらの問題に対処するために,情報的低確率トークンに着目したマルチスケール不確実性推定器Uncertaintyを提案する。局所的には、低確率トークンの対数確率を平均化することでボイラープレート支配を緩和し、世界中のレニイエントロピーを通じてこの低確率領域の分布形状を捉えて脆さを減少させる。さらに、条件付き独立サンプリングにより不確実性++に検出器を拡張し、より安定した不確実性推定を行う。 7つのデータセットと16のLSMにわたる実験は、高い有効性、一般化、堅牢性を示している。私たちのコードはhttps://github.com/guoyikai2000/Uncertainty-AIGTで公開されています。

論文の概要: On the Salience of Low-Probability Tokens for AI-Generated Text Detection: A Multiscale Uncertainty Perspective

関連論文リスト