Fugu-MT 論文翻訳(概要): Continuum Transformers Perform In-Context Learning by Operator Gradient Descent

論文の概要: Continuum Transformers Perform In-Context Learning by Operator Gradient Descent

arxiv url: http://arxiv.org/abs/2505.17838v1
Date: Fri, 23 May 2025 12:52:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-26 18:08:34.079867
Title: Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
Title（参考訳）: 演算子グラディエントDescentによる文脈内学習を実現する連続変圧器
Authors: Abhiti Mishra, Yash Patel, Ambuj Tewari,
Abstract要約: 連続体変換器は、演算子RKHSで勾配降下を実行することで、コンテキスト内演算子学習を行うことができることを示す。本研究では、この最適性結果の実証検証を行い、この勾配降下を行うパラメータが連続変圧器訓練によって回復されることを実証する。
参考スコア（独自算出の注目度）: 18.928543069018865
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers robustly exhibit the ability to perform in-context learning, whereby their predictive accuracy on a task can increase not by parameter updates but merely with the placement of training samples in their context windows. Recent works have shown that transformers achieve this by implementing gradient descent in their forward passes. Such results, however, are restricted to standard transformer architectures, which handle finite-dimensional inputs. In the space of PDE surrogate modeling, a generalization of transformers to handle infinite-dimensional function inputs, known as "continuum transformers," has been proposed and similarly observed to exhibit in-context learning. Despite impressive empirical performance, such in-context learning has yet to be theoretically characterized. We herein demonstrate that continuum transformers perform in-context operator learning by performing gradient descent in an operator RKHS. We demonstrate this using novel proof strategies that leverage a generalized representer theorem for Hilbert spaces and gradient flows over the space of functionals of a Hilbert space. We additionally show the operator learned in context is the Bayes Optimal Predictor in the infinite depth limit of the transformer. We then provide empirical validations of this optimality result and demonstrate that the parameters under which such gradient descent is performed are recovered through the continuum transformer training.
Abstract（参考訳）: トランスフォーマーは、タスク上の予測精度がパラメータ更新によって増大するだけでなく、単にコンテキストウィンドウにトレーニングサンプルを配置することで、コンテキスト内学習を実行する能力を示す。近年の研究では、変圧器は前方パスに勾配降下を実装することでこれを実現することが示されている。しかし、そのような結果は有限次元入力を処理する標準トランスアーキテクチャに制限されている。 PDEサロゲートモデリングの分野では、"continuum transformer"として知られる無限次元関数入力を扱う変換器の一般化が提案され、同様に文脈内学習を示すことが観察されている。印象的な経験的性能にもかかわらず、このような文脈内学習は理論上はまだ特徴づけられていない。本稿では,連続体変換器が演算子RKHSで勾配降下を行うことで,文脈内演算子学習を行うことを示す。ヒルベルト空間に対する一般化された表現定理とヒルベルト空間の函数空間上の勾配流を利用する新しい証明戦略を用いてこれを実証する。さらに、文脈で学んだ作用素が変換器の無限深さ極限におけるベイズ最適予測器であることを示す。次に、この最適性結果の実証検証を行い、このような勾配降下を行うパラメータが連続変圧器訓練によって復元されることを示す。

論文の概要: Continuum Transformers Perform In-Context Learning by Operator Gradient Descent

関連論文リスト