Fugu-MT 論文翻訳(概要): Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent

論文の概要: Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent

arxiv url: http://arxiv.org/abs/2410.11268v1
Date: Tue, 15 Oct 2024 04:44:23 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-28 17:07:35.62112
Title: Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Title（参考訳）: 指数依存をバイパスする:多段階グラディエント・ディグネッセントによるインコンテキスト学習のループ変換器
Authors: Bo Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song,
Abstract要約: 線形ループ変換器は、コンテキスト内学習において、多段階勾配勾配を効率よく実装できることを示す。この結果から,入力データが一定条件数である場合,$n = O(d)$であれば,線形ループ変換器の誤差は小さくなることがわかった。
参考スコア（独自算出の注目度）: 26.764893400499354
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-context learning has been recognized as a key factor in the success of Large Language Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in-context examples in the prompt during inference. Previous studies have demonstrated that the Transformer architecture used in LLMs can implement a single-step gradient descent update by processing in-context examples in a single forward pass. Recent work has further shown that, during in-context learning, a looped Transformer can implement multi-step gradient descent updates in forward passes. However, their theoretical results require an exponential number of in-context examples, $n = \exp(\Omega(T))$, where $T$ is the number of loops or passes, to achieve a reasonably low error. In this paper, we study linear looped Transformers in-context learning on linear vector generation tasks. We show that linear looped Transformers can implement multi-step gradient descent efficiently for in-context learning. Our results demonstrate that as long as the input data has a constant condition number, e.g., $n = O(d)$, the linear looped Transformers can achieve a small error by multi-step gradient descent during in-context learning. Furthermore, our preliminary experiments validate our theoretical analysis. Our findings reveal that the Transformer architecture possesses a stronger in-context learning capability than previously understood, offering new insights into the mechanisms behind LLMs and potentially guiding the better design of efficient inference algorithms for LLMs.
Abstract（参考訳）: In-context Learningは、Large Language Models(LLMs)の成功の重要な要因として認識されている。これは、推論中にプロンプトで提供されるインコンテキストの例から、モデルがオンザフライでパターンを学習する能力を指す。これまでの研究で、LLMで使用されているTransformerアーキテクチャは、単一のフォワードパスでコンテキスト内例を処理することで、単一ステップの勾配勾配更新を実装できることが示されている。近年の研究では、コンテキスト内学習中にループ変換器が前方パスで多段階勾配降下更新を実装できることが示されている。しかし、それらの理論的な結果は指数関数的なインコンテキストの例($n = \exp(\Omega(T))$)を必要とする。本稿では,線形ベクトル生成タスクにおける線形ループ変換器のコンテキスト内学習について検討する。線形ループ変換器は、コンテキスト内学習において、多段階勾配勾配を効率よく実装できることを示す。この結果から,入力データに一定の条件数,例えば$g , $n = O(d)$ がある限り,線形ループ変換器はコンテキスト内学習において,複数ステップの勾配勾配で小さな誤差を発生させることができることがわかった。さらに,我々の予備実験は,我々の理論解析を検証した。この結果から,Transformer アーキテクチャは従来理解されていたよりもコンテキスト内学習能力が強く,LCM の背後にあるメカニズムの新たな洞察や,LCM の効率的な推論アルゴリズムの設計の指針となる可能性が示唆された。

論文の概要: Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent

関連論文リスト