Fugu-MT 論文翻訳(概要): Decoding-Free Sampling Strategies for LLM Marginalization

論文の概要: Decoding-Free Sampling Strategies for LLM Marginalization

arxiv url: http://arxiv.org/abs/2510.20208v1
Date: Thu, 23 Oct 2025 04:50:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:17.337504
Title: Decoding-Free Sampling Strategies for LLM Marginalization
Title（参考訳）: LLMマージナイズのためのデコードフリーサンプリング手法
Authors: David Pohl, Marco Cognetta, Junyoung Lee, Naoaki Okazaki,
Abstract要約: 現代の言語モデルは、モデルサイズ、推論速度、語彙カバレッジの間のトレードオフを実現するために、サブワードトークン化されたテキストで機能する。我々はデコード不要なサンプリング戦略について検討し、その代わりにモデルやトークン化ツールに依存しない非常に安価なサンプリング戦略に完全に依存する。
参考スコア（独自算出の注目度）: 15.214953630908477
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Modern language models operate on subword-tokenized text in order to make a trade-off between model size, inference speed, and vocabulary coverage. A side effect of this is that, during inference, models are evaluated by measuring the probability of only the specific tokenization produced as the output, despite there being many possible ways to represent the same text with a subword vocabulary. Recent studies have argued instead for evaluating LLMs by marginalization - the probability mass of all tokenizations of a given text. Marginalization is difficult due to the number of possible tokenizations of a text, so often approximate marginalization is done via sampling. However, a downside of sampling is that an expensive generation step must be performed by the LLM for each sample, which limits the number of samples that can be acquired given a runtime budget, and therefore also the accuracy of the approximation. Since computing the probability of a sequence given the tokenization is relatively cheap compared to actually generating it, we investigate sampling strategies that are decoding-free - they require no generation from the LLM, instead relying entirely on extremely cheap sampling strategies that are model and tokenizer agnostic. We investigate the approximation quality and speed of decoding-free sampling strategies for a number of open models to find that they provide sufficiently accurate marginal estimates at a small fraction of the runtime cost and demonstrate its use on a set of downstream inference tasks.
Abstract（参考訳）: 現代の言語モデルは、モデルサイズ、推論速度、語彙カバレッジの間のトレードオフを実現するために、サブワードトークン化されたテキストで機能する。これの副作用は、推論の間、サブワード語彙で同じテキストを表現する多くの方法があるにもかかわらず、モデルが出力として生成された特定のトークン化のみの確率を測定することによって評価されることである。近年の研究は、あるテキストの全てのトークン化の確率質量である余分化によるLSMの評価に代えて議論している。テキストのトークン化が可能であるため、マージナリゼーションは難しいため、サンプリングによって近似されたマージナリゼーションが行われることが多い。しかし、サンプリングの欠点は、各サンプルに対して高価な生成ステップをLCMで実行し、ランタイム予算によって取得できるサンプルの数を制限することであり、したがって近似の正確性も制限する。トークン化を与えられたシーケンスの確率は実際に生成するよりも比較的安価であるため、デコード不要なサンプリング戦略を調査する。本研究では,多数のオープンモデルに対するデコードフリーサンプリング戦略の近似品質と高速化について検討し,実行時コストのごく一部で十分な精度の限界推定を行い,下流の推論タスクでの利用を実証する。

論文の概要: Decoding-Free Sampling Strategies for LLM Marginalization

関連論文リスト