Fugu-MT 論文翻訳(概要): Provable test-time adaptivity and distributional robustness of in-context learning

論文の概要: Provable test-time adaptivity and distributional robustness of in-context learning

arxiv url: http://arxiv.org/abs/2510.23254v1
Date: Mon, 27 Oct 2025 12:16:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 15:28:15.54214
Title: Provable test-time adaptivity and distributional robustness of in-context learning
Title（参考訳）: 文脈内学習における確率的テスト時間適応性と分布ロバスト性
Authors: Tianyi Ma, Tengyao Wang, Richard J. Samworth,
Abstract要約: 混合分布$pi=sum_alphainmathcalA lambda_alpha pi_alpha$。十分なデータに基づいて事前訓練された大きな変換器が、困難度$$beta$に対応する収束率の最適値を達成することを証明した。
参考スコア（独自算出の注目度）: 7.8103599113080255
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study in-context learning problems where a Transformer is pretrained on tasks drawn from a mixture distribution $\pi=\sum_{\alpha\in\mathcal{A}} \lambda_{\alpha} \pi_{\alpha}$, called the pretraining prior, in which each mixture component $\pi_{\alpha}$ is a distribution on tasks of a specific difficulty level indexed by $\alpha$. Our goal is to understand the performance of the pretrained Transformer when evaluated on a different test distribution $\mu$, consisting of tasks of fixed difficulty $\beta\in\mathcal{A}$, and with potential distribution shift relative to $\pi_\beta$, subject to the chi-squared divergence $\chi^2(\mu,\pi_{\beta})$ being at most $\kappa$. In particular, we consider nonparametric regression problems with random smoothness, and multi-index models with random smoothness as well as random effective dimension. We prove that a large Transformer pretrained on sufficient data achieves the optimal rate of convergence corresponding to the difficulty level $\beta$, uniformly over test distributions $\mu$ in the chi-squared divergence ball. Thus, the pretrained Transformer is able to achieve faster rates of convergence on easier tasks and is robust to distribution shift at test time. Finally, we prove that even if an estimator had access to the test distribution $\mu$, the convergence rate of its expected risk over $\mu$ could not be faster than that of our pretrained Transformers, thereby providing a more appropriate optimality guarantee than minimax lower bounds.
Abstract（参考訳）: 混合分布$\pi=\sum_{\alpha\in\mathcal{A}} \lambda_{\alpha} \pi_{\alpha}$は、各混合成分$\pi_{\alpha}$は、$\alpha$でインデックス付けされた特定の困難レベルのタスクの分布である。我々のゴールは、異なるテストディストリビューション上で評価された事前トレーニングされたTransformerのパフォーマンスを理解することである。$\mu$。これは、固定困難のタスクからなる$\beta\in\mathcal{A}$と、Chi-squared divergence$\chi^2(\mu,\pi_{\beta})$が最大$\kappa$となる確率分布シフトを持つ。特に、ランダムな滑らかさを伴う非パラメトリック回帰問題と、ランダムな滑らかさとランダムな有効次元を持つマルチインデックスモデルを考える。十分なデータに基づいて事前訓練された大きなトランスフォーマーは、カイ二乗発散球におけるテスト分布$\mu$に対して、困難度$\beta$に対応する収束の最適率を達成することを証明した。このように、事前訓練されたTransformerは、より簡単なタスクの収束速度を向上することができ、テスト時の分散シフトに対して堅牢である。最後に、推定器がテスト分布に$\mu$をアクセスしたとしても、予測されるリスクの収束率は、事前訓練されたトランスフォーマーよりも高速でないことを証明し、最小限の下位境界よりも適切な最適性を保証する。

論文の概要: Provable test-time adaptivity and distributional robustness of in-context learning

関連論文リスト