Fugu-MT 論文翻訳(概要): Next-Token Prediction and Regret Minimization

論文の概要: Next-Token Prediction and Regret Minimization

arxiv url: http://arxiv.org/abs/2603.28499v1
Date: Mon, 30 Mar 2026 14:34:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.44406
Title: Next-Token Prediction and Regret Minimization
Title（参考訳）: 次世代予測とレグレット最小化
Authors: Mehryar Mohri, Clayton Sanford, Jon Schneider, Kiran Vodrahalli, Yifan Wu,
Abstract要約: 対戦型オンライン意思決定環境において,次世代の予測アルゴリズムをいかに活用するかという課題を考察する。すべての分布 $mathcalD$ は低回帰分布ではないが、すべての分布 $mathcalD$ は1つの低回帰分布に指数関数的に近いことを示す。
参考スコア（独自算出の注目度）: 39.73178505655866
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the question of how to employ next-token prediction algorithms in adversarial online decision-making environments. Specifically, if we train a next-token prediction model on a distribution $\mathcal{D}$ over sequences of opponent actions, when is it the case that the induced online decision-making algorithm (by approximately best responding to the model's predictions) has low adversarial regret (i.e., when is $\mathcal{D}$ a \emph{low-regret distribution})? For unbounded context windows (where the prediction made by the model can depend on all the actions taken by the adversary thus far), we show that although not every distribution $\mathcal{D}$ is a low-regret distribution, every distribution $\mathcal{D}$ is exponentially close (in TV distance) to one low-regret distribution, and hence sublinear regret can always be achieved at negligible cost to the accuracy of the original next-token prediction model. In contrast to this, for bounded context windows (where the prediction made by the model can depend only on the past $w$ actions taken by the adversary, as may be the case in modern transformer architectures), we show that there are some distributions $\mathcal{D}$ of opponent play that are $Θ(1)$-far from any low-regret distribution $\mathcal{D'}$ (even when $w = Ω(T)$ and such distributions exist). Finally, we complement these results by showing that the unbounded context robustification procedure can be implemented by layers of a standard transformer architecture, and provide empirical evidence that transformer models can be efficiently trained to represent these new low-regret distributions.
Abstract（参考訳）: 対戦型オンライン意思決定環境において,次世代の予測アルゴリズムをいかに活用するかという課題を考察する。具体的には、ある分布上の次トーケン予測モデルに、対立する行動の列に対して$\mathcal{D}$をトレーニングした場合、帰納的オンライン決定アルゴリズム(モデルの予測にほぼ最もよく反応する)が反逆的後悔の少ない場合(つまり、$\mathcal{D}$ a \emph{low-regret distribution})はいつになるのか? すべての分布 $\mathcal{D}$ が低レグレット分布であるわけではないが、すべての分布 $\mathcal{D}$ は1つの低レグレット分布に指数関数的に(テレビ距離において)近いので、従って線形後悔は常に元の次トーケン予測モデルの精度に無視できるコストで達成できることを示す。これとは対照的に、有界なコンテキストウィンドウ(現在のトランスフォーマーアーキテクチャのように、モデルによってなされた予測は、過去の$w$アクションにのみ依存する)では、任意の低regret分布から$\mathcal{D'}$ (w = Ω(T)$とそのような分布が存在するとしても) の逆プレイの$\mathcal{D}$が存在することを示す。最後に, 標準変圧器アーキテクチャの層によって非有界なコンテキストロバスト化処理が実装可能であることを示すことによって, これらの結果を補完し, これらの新しい低相対分布を表現するために, 変圧器モデルを効率的に訓練できるという実証的な証拠を提供する。

論文の概要: Next-Token Prediction and Regret Minimization

関連論文リスト