Fugu-MT 論文翻訳(概要): How Focused Are LLMs? A Quantitative Study via Repetitive Deterministic Prediction Tasks

論文の概要: How Focused Are LLMs? A Quantitative Study via Repetitive Deterministic Prediction Tasks

arxiv url: http://arxiv.org/abs/2511.00763v1
Date: Sun, 02 Nov 2025 01:42:08 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:26.924127
Title: How Focused Are LLMs? A Quantitative Study via Repetitive Deterministic Prediction Tasks
Title（参考訳）: LLMはどの程度焦点が当てられているか?繰り返し決定論的予測タスクによる定量的研究
Authors: Wanda Hou, Leon Zhou, Hong-Ye Hu, Yi-Zhuang You, Xiao-Liang Qi,
Abstract要約: 繰り返し決定論的予測タスクにおける大規模言語モデルの性能について検討する。実験により, 特徴的な長さスケールを超える急激な2重指数降下が明らかとなった。これは、モデルがそれぞれの操作を独立して実行できないことを示している。
参考スコア（独自算出の注目度）: 0.9338697277815541
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the performance of large language models on repetitive deterministic prediction tasks and study how the sequence accuracy rate scales with output length. Each such task involves repeating the same operation n times. Examples include letter replacement in strings following a given rule, integer addition, and multiplication of string operators in many body quantum mechanics. If the model performs the task through a simple repetition algorithm, the success rate should decay exponentially with sequence length. In contrast, our experiments on leading large language models reveal a sharp double exponential drop beyond a characteristic length scale, forming an accuracy cliff that marks the transition from reliable to unstable generation. This indicates that the models fail to execute each operation independently. To explain this phenomenon, we propose a statistical physics inspired model that captures the competition between external conditioning from the prompt and internal interference among generated tokens. The model quantitatively reproduces the observed crossover and provides an interpretable link between attention induced interference and sequence level failure. Fitting the model to empirical results across multiple models and tasks yields effective parameters that characterize the intrinsic error rate and error accumulation factor for each model task pair, offering a principled framework for understanding the limits of deterministic accuracy in large language models.
Abstract（参考訳）: 本研究では,繰り返し決定論的予測タスクにおける大規模言語モデルの性能について検討し,シーケンスの精度が出力長とともにどのようにスケールするかを検討する。それぞれのタスクは、同じ操作をn回繰り返します。例えば、与えられた規則に従う文字列の文字置換、整数加算、多くの体量子力学における文字列演算子の乗算などである。モデルが単純な繰り返しアルゴリズムでタスクを実行する場合、成功率は指数関数的にシーケンス長で減衰する。対照的に、我々の大規模言語モデルにおける実験では、特徴的長さスケールを超える急激な2倍指数差が示され、信頼性から不安定な生成への遷移を示す精度の崖が形成される。これは、モデルがそれぞれの操作を独立して実行できないことを示している。この現象を説明するために、生成トークン間のプロンプトと内部干渉から外部条件の競合を捉える統計物理学モデルを提案する。このモデルは観測されたクロスオーバーを定量的に再現し、注意誘導干渉とシーケンスレベルの故障の間の解釈可能なリンクを提供する。モデルを複数のモデルとタスクにまたがって経験的な結果に合わせると、各モデルタスクペアの固有のエラー率とエラー累積係数を特徴付ける効果的なパラメータが得られ、大きな言語モデルにおける決定論的精度の限界を理解するための原則化されたフレームワークを提供する。

論文の概要: How Focused Are LLMs? A Quantitative Study via Repetitive Deterministic Prediction Tasks

関連論文リスト