Fugu-MT 論文翻訳(概要): LLMs are Bayesian, in Expectation, not in Realization

論文の概要: LLMs are Bayesian, in Expectation, not in Realization

arxiv url: http://arxiv.org/abs/2507.11768v1
Date: Tue, 15 Jul 2025 22:20:11 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-17 19:00:11.168381
Title: LLMs are Bayesian, in Expectation, not in Realization
Title（参考訳）: LLMはベイズ的であり、実現には期待できない
Authors: Leon Chlon, Sarah Rashidi, Zein Khamis, MarcAntonio M. Awada,
Abstract要約: 大きな言語モデルはパラメータを更新せずに新しいタスクに適応する。最近の経験的発見は根本的な矛盾を示しており、変圧器はマルティンゲールの性質を体系的に侵害している。この違反は、臨界応用における不確実性定量化の基礎となる理論的基礎に挑戦する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models demonstrate remarkable in-context learning capabilities, adapting to new tasks without parameter updates. While this phenomenon has been successfully modeled as implicit Bayesian inference, recent empirical findings reveal a fundamental contradiction: transformers systematically violate the martingale property, a cornerstone requirement of Bayesian updating on exchangeable data. This violation challenges the theoretical foundations underlying uncertainty quantification in critical applications. Our theoretical analysis establishes four key results: (1) positional encodings induce martingale violations of order $\Theta(\log n / n)$; (2) transformers achieve information-theoretic optimality with excess risk $O(n^{-1/2})$ in expectation over orderings; (3) the implicit posterior representation converges to the true Bayesian posterior in the space of sufficient statistics; and (4) we derive the optimal chain-of-thought length as $k^* = \Theta(\sqrt{n}\log(1/\varepsilon))$ with explicit constants, providing a principled approach to reduce inference costs while maintaining performance. Empirical validation on GPT-3 confirms predictions (1)-(3), with transformers reaching 99\% of theoretical entropy limits within 20 examples. Our framework provides practical methods for extracting calibrated uncertainty estimates from position-aware architectures and optimizing computational efficiency in deployment.
Abstract（参考訳）: 大規模言語モデルは、パラメータを更新せずに新しいタスクに適応する、優れたコンテキスト内学習能力を示す。この現象は暗黙のベイズ推定としてモデル化されているが、最近の経験的発見では根本的な矛盾が明らかになっている。この違反は、臨界応用における不確実性定量化の基礎となる理論的基礎に挑戦する。我々の理論分析は,(1) 位置エンコーディングが次数$\Theta(\log n / n)$,(2) トランスフォーマーが過剰リスク$O(n^{-1/2})$を期待して情報理論最適性を達成すること,(3) 暗黙的後続表現が十分統計量空間の真のベイズ的後続表現に収束すること,(4) 最適連鎖長を$k^* = \Theta(\sqrt{n}\log(1/\varepsilon)$ に定値で導出すること,そして,性能を維持しながら推論コストを削減するための原則的アプローチを提供すること,の4つの重要な結果を確立する。 GPT-3の実証検証では, 変圧器は20例中99%のエントロピー限界に達した。本フレームワークは、位置認識アーキテクチャから校正された不確実性推定を抽出し、デプロイメントにおける計算効率を最適化する実用的な方法を提供する。

関連論文リスト

Pointwise confidence estimation in the non-linear $\ell^2$-regularized least squares [12.352761060862072]
固定設計による $ell2$-regularized 非線形最小二乗集合の高確率非漸近信頼度推定について検討する。つまり、任意の固定テスト入力に対して$x$の予測を保持することを意味する。
論文参考訳（メタデータ） (2025-06-08T11:23:49Z)
Born a Transformer -- Always a Transformer? [57.37263095476691]
We study a family of $textitretrieval$ and $textitcopying$ tasks inspired by Liu et al。我々は、事前訓練されたモデルがクエリトークンの左(アンチインダクション)よりも右(インダクション)へのトークンの検索が優れているような、$textitinduction-versus-anti-induction$ asymmetricを観察する。力学解析により、この非対称性は、事前学習された変圧器内の誘導の強度と反誘導回路の強度の違いに関係していることが明らかになった。
論文参考訳（メタデータ） (2025-05-27T21:36:50Z)
Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking [50.465604300990904]
グロキング(Grokking)とは、オーバーフィッティングの拡張後のテスト精度の急激な改善を指す。本研究では、素数演算のタスクにおいて、Transformerの基盤となるグルーキング機構について検討する。
論文参考訳（メタデータ） (2025-04-04T04:42:38Z)
Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification [50.717692060500696]
対数損失を伴う次のトーケン予測は自己回帰シーケンスモデリングの基盤となる。次トーケン予測は、適度な誤差増幅を表す$C=tilde O(H)$を達成するために堅牢にすることができる。 C=e(log H)1-Omega(1)$。
論文参考訳（メタデータ） (2025-02-18T02:52:00Z)
Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification [7.869708570399577]
条件付き期待値 $mathbbE[Y|X]$ と条件付き分散 Var$(Y|X)$ の両方を予測する双目的予測タスクを考える。理論的には、トレーニングされたトランスフォーマーがベイズ最適付近に到達し、トレーニング分布の情報の利用が示唆される。
論文参考訳（メタデータ） (2024-05-24T00:08:55Z)
PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates [17.777466668123886]
PROMISE ($textbfPr$econditioned $textbfO$ptimization $textbfM$ethods by $textbfI$ncorporating $textbfS$calable Curvature $textbfE$stimates)はスケッチベースの事前条件勾配アルゴリズムである。 PROMISEには、SVRG、SAGA、およびKatyushaのプレコンディション版が含まれている。
論文参考訳（メタデータ） (2023-09-05T07:49:10Z)
Transformers as Support Vector Machines [54.642793677472724]
自己アテンションの最適化幾何と厳密なSVM問題との間には,形式的等価性を確立する。勾配降下に最適化された1層変圧器の暗黙バイアスを特徴付ける。これらの発見は、最適なトークンを分離し選択するSVMの階層としてのトランスフォーマーの解釈を刺激していると信じている。
論文参考訳（メタデータ） (2023-08-31T17:57:50Z)
Improved Convergence of Score-Based Diffusion Models via Prediction-Correction [15.772322871598085]
スコアベース生成モデル(SGM)は、複雑なデータ分布からサンプリングする強力なツールである。本稿では,一般的な予測器・相関器方式のバージョンを考慮し,この問題に対処する。まず、不正確なランゲヴィン力学を用いて最終分布を推定し、次にその過程を逆転する。
論文参考訳（メタデータ） (2023-05-23T15:29:09Z)
Confident Adaptive Language Modeling [95.45272377648773]
CALMは、入力と生成時間ごとに異なる量の計算を動的に割り当てるフレームワークである。ハイパフォーマンスを確実に維持しつつ、計算能力、潜在的スピードアップを最大3ドルまで削減する上で、我々のフレームワークの有効性を実証する。
論文参考訳（メタデータ） (2022-07-14T17:00:19Z)
Stability and Risk Bounds of Iterative Hard Thresholding [41.082982732100696]
アルゴリズム安定性の概念の下でIHTの新しいスパース一般化理論を導入する。スパースレベル$k$のIHTは、スパース過剰リスクにおける収束率を$mathcaltilde O(n-1/2sqrtlog(n)log(p))$で楽しむことを示す。理論的予測を確認するための予備的な数値的証拠が提供される。
論文参考訳（メタデータ） (2022-03-17T16:12:56Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
暗黙のニューラルネットワークは、精度の向上とメモリ消費の大幅な削減を示す。彼らは不利な姿勢と収束の不安定さに悩まされる。本論文は,ニューラルネットワークを高機能かつ頑健に設計するための新しい枠組みを提供する。
論文参考訳（メタデータ） (2021-06-06T18:05:02Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。