Fugu-MT 論文翻訳(概要): Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

論文の概要: Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

arxiv url: http://arxiv.org/abs/2605.06609v1
Date: Thu, 07 May 2026 17:27:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:12.041465
Title: Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
Title（参考訳）: 正規化グラディエント老化による文脈内ロジスティック回帰を効率よく行う変圧器
Authors: Chenyang Zhang, Yuan Cao,
Abstract要約: ソフトマックスアテンションを持つ変圧器は線形分類データに基づいてコンテキスト内学習を行う。まず、コンテキスト内ロジスティック回帰を実行できる多層変換器のクラスを構築する。構築した変圧器は, (i) 1ステップの勾配降下によって教師される1つの自己注意層をトレーニングし, (ii) 訓練された層を繰り返し適用してループモデルが得られることを示す。
参考スコア（独自算出の注目度）: 9.440916748352722
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Transformers have demonstrated remarkable in-context learning (ICL) capabilities. The strong ICL performance of transformers is commonly believed to arise from their ability to implicitly execute certain algorithms on the context, thereby enhancing prediction and generation. In this work, we investigate how transformers with softmax attention perform in-context learning on linear classification data. We first construct a class of multi-layer transformers that can perform in-context logistic regression, with each layer exactly performing one step of normalized gradient descent on an in-context loss. Then, we show that our constructed transformer can be obtained through (i) training a single self-attention layer supervised by one-step gradient descent, and (ii) recurrently applying the trained layer to obtain a looped model. Training convergence guarantees of the self-attention layer and out-of-distribution generalization guarantees of the looped model are provided. Our results advance the theoretical understanding of ICL mechanism by showcasing how softmax transformers can effectively act as in-context learners.
Abstract（参考訳）: トランスフォーマーは、顕著なインコンテキスト学習(ICL)能力を示した。トランスの強力なICL性能は、特定のアルゴリズムを暗黙的にコンテキスト上で実行し、予測と生成を向上させる能力から生じると一般的に信じられている。本研究では,線形分類データに対して,ソフトマックスに着目した変換器がコンテキスト内学習を行う方法を検討する。まず,各層が正規化勾配降下の一段階を正確に行うことで,コンテキスト内ロジスティック回帰を実現できる多層変圧器のクラスを構築した。そして, 構築した変圧器を解析して得られることを示す。一一段階勾配降下により監督された単一の自己注意層を訓練し、 2) 学習層を繰り返し適用してループモデルを得る。自己アテンション層のトレーニング収束保証とループモデルのアウト・オブ・ディストリビューション一般化保証とを備える。本研究は,ソフトマックストランスフォーマーが文脈内学習者として効果的に機能することを示すことによって,ICL機構の理論的理解を推し進めるものである。

論文の概要: Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

関連論文リスト