Fugu-MT 論文翻訳(概要): Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics

論文の概要: Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics

arxiv url: http://arxiv.org/abs/2606.08417v1
Date: Sun, 07 Jun 2026 02:35:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.097046
Title: Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics
Title（参考訳）: 生成の複雑さをハックする:なぜ非条件のテキスト評価が分散メトリクスを必要とするのか
Authors: Antonio Franca, Alexander Tong,
Abstract要約: 拡散および連続フローベースの言語モデルは、言語モデリングに対する非自己回帰的な主要な代替手段として現れている。両方のパラダイムの進歩は、生成的複雑度(gen-PPL)によって圧倒的に追跡される。我々は、この指標は正しくないと主張している。構築により、gen-PPLは、文法性やセマンティックコヒーレンスではなく、スコアARの下でのみ予測可能性を測定する。
参考スコア（独自算出の注目度）: 49.443264461057645
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion and continuous flow-based language models have emerged as the leading non-autoregressive alternatives to language modeling. Progress in both paradigms is overwhelmingly tracked by generative perplexity (gen-PPL): the per-token negative log-likelihood of samples under a frozen autoregressive (AR) scorer such as gpt2-large, typically paired with an empirical-entropy guardrail to rule out low-entropy collapse. We argue that this metric is unsound. By construction, gen-PPL measures only predictability under the scoring AR, not grammaticality or semantic coherence -- and the set of predictable but still low-quality sequences is combinatorially large. To make this concrete, we construct a suite of zero-parameter, deliberately naive samplers that achieve state-of-the-art gen-PPL on LM1B and OpenWebText at non-degenerate entropy, surpassing recently published diffusion and continuous-flow models while producing text that is incoherent by construction. We recommend evaluation suites that directly quantify the distributional divergence between generated and reference text, and use such a suite to re-benchmark recent non-autoregressive models, recovering a more faithful picture of the current state of the art.
Abstract（参考訳）: 拡散および連続フローベースの言語モデルは、言語モデリングに対する非自己回帰的な主要な代替手段として現れている。両方のパラダイムの進歩は、生成的複雑度(gen-PPL)によって圧倒的に追跡される: gpt2-largeのような凍結自己回帰(AR)スコアラーの下でのサンプルの1対の負の対数類似性であり、通常、経験的エントロピーガードレールと組み合わせて低エントロピー崩壊を排除している。この計量は正しくないと主張する。構築によって、gen-PPLは文法性やセマンティックコヒーレンスではなく、スコアARの下での予測可能性のみを測定する。この具体化のために,非退化エントロピーにおいてLM1BおよびOpenWebText上で最先端のgen-PPLを実現するために,ゼロパラメータ,故意に無作為なサンプルセットを構築した。生成したテキストと参照テキスト間の分布のばらつきを直接定量化する評価スイートを推奨し、そのようなスイートを用いて最近の非自己回帰モデルを再ベンチマークし、現状のより忠実なイメージを復元する。

論文の概要: Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics

関連論文リスト