Fugu-MT 論文翻訳(概要): Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

論文の概要: Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

arxiv url: http://arxiv.org/abs/2402.12875v4
Date: Sat, 21 Sep 2024 06:48:45 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-09 04:32:42.401480
Title: Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Title（参考訳）: 思考の連鎖が変圧器に根源的なシリアル問題を解く力を与える
Authors: Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma,
Abstract要約: 思考の連鎖(CoT)は、算術や記号的推論タスクにおいて、大きな言語モデル(LLM)の精度を向上させるための非常に効果的な方法である。この研究は、表現性のレンズを通してデコーダのみの変換器に対するCoTのパワーを理論的に理解する。
参考スコア（独自算出の注目度）: 57.58801785642868
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Given input length $n$, previous works have shown that constant-depth transformers with finite precision $\mathsf{poly}(n)$ embedding size can only solve problems in $\mathsf{TC}^0$ without CoT. We first show an even tighter expressiveness upper bound for constant-depth transformers with constant-bit precision, which can only solve problems in $\mathsf{AC}^0$, a proper subset of $ \mathsf{TC}^0$. However, with $T$ steps of CoT, constant-depth transformers using constant-bit precision and $O(\log n)$ embedding size can solve any problem solvable by boolean circuits of size $T$. Empirically, enabling CoT dramatically improves the accuracy for tasks that are hard for parallel computation, including the composition of permutation groups, iterated squaring, and circuit value problems, especially for low-depth transformers.
Abstract（参考訳）: モデルに中間段階、すなわち思考の連鎖(CoT)を生成するように指示することは、算術やシンボリック推論タスクにおいて大きな言語モデル(LLM)の精度を向上させるための非常に効果的な方法である。しかし、CoTの背後にあるメカニズムは未だに不明である。この研究は、表現性のレンズを通してデコーダのみの変換器に対するCoTのパワーを理論的に理解する。概念的には、CoTはモデルに本質的にシリアルな計算を実行する能力を持たせる。入力長$n$が与えられたとき、以前の研究は有限精度$\mathsf{poly}(n)$埋め込みサイズを持つ定数深度変換器は、CoTのない$\mathsf{TC}^0$でしか解決できないことを示した。まず、定数ビット精度を持つ定数深度変換器に対して、より厳密な表現性上限を示す。これは、$ \mathsf{TC}^0$ の固有部分集合である $\mathsf{AC}^0$ の問題を解くことしかできない。しかし、CoTの$T$ステップでは、定数ビット精度と$O(\log n)$埋め込みサイズを使った定数深度変換器は、サイズ$T$のブール回路で解けるあらゆる問題を解くことができる。経験的に、CoTを有効にすることで、特に低深度トランスフォーマーにおいて、置換群、反復スクアリング、回路値問題などの並列計算に苦しむタスクの精度が劇的に向上する。

関連論文リスト

Provable Failure of Language Models in Learning Majority Boolean Logic via Gradient Descent [15.291830857281015]
勾配法を用いて学習すると,トランスフォーマーが真に単純な多数関数を学習できるかどうかを検討する。我々の分析は、$mathrmpoly(d)$グラデーションクエリ後も、Transformerモデルの一般化誤差は依然としてかなり大きいことを証明している。
論文参考訳（メタデータ） (2025-04-07T03:08:12Z)
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers [5.4649464326326]
整合推論とスクラッチパッドは、変換器の計算能力を高める重要なツールとして登場した。本研究では,異なるアルゴリズム問題にまたがるCoTステップ数に対する体系的下界の研究を開始する。
論文参考訳（メタデータ） (2025-02-04T15:14:01Z)
Circuit Complexity Bounds for RoPE-based Transformer Architecture [25.2590541420499]
実証的な証拠は、$mathsfRoPE$ベースのTransformerアーキテクチャがより高度な一般化能力を示していることを示唆している。例えば$mathsfTC0 = mathsfNC1$, $mathsfRoPE$-based Transformer with $mathrmpoly(n)$-precision, $O(1)$ Layer, hidden dimension $d leq O(n)$は算術問題を解くことができないことを示す。
論文参考訳（メタデータ） (2024-11-12T07:24:41Z)
On the Role of Depth and Looping for In-Context Learning with Task Diversity [69.4145579827826]
多様なタスクを伴う線形回帰のための文脈内学習について検討する。 We show that multilayer Transformer is not robust to even distributional shifts as $O(e-L)$ in Wasserstein distance。
論文参考訳（メタデータ） (2024-10-29T03:27:56Z)
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time [17.086679273053853]
本研究では,新しい高速近似法により,ほぼ線形時間で勾配を計算することができることを示す。勾配の効率を改善することで、この作業がより効果的なトレーニングと長期コンテキスト言語モデルのデプロイを促進することを期待する。
論文参考訳（メタデータ） (2024-08-23T17:16:43Z)
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective [39.47116013338394]
CoT(Chain-of-Thought prompting)は,大規模言語モデル(LLM)の性能を劇的に向上させる我々は、CoTが動的プログラミング(Dynamic Programming)として知られる一般的な意思決定問題に対処できることを示します。
論文参考訳（メタデータ） (2023-05-24T17:59:21Z)
Transformers Learn Shortcuts to Automata [52.015990420075944]
低深度変換器は任意の有限状態オートマトンを計算できる。我々は,$O(log T)$レイヤを持つ変換器が,長さ$T$の入力シーケンス上で,オートマトンを正確に再現可能であることを示す。さらに、これらの解の脆性について検討し、潜在的な緩和を提案する。
論文参考訳（メタデータ） (2022-10-19T17:45:48Z)
The Parallelism Tradeoff: Limitations of Log-Precision Transformers [29.716269397142973]
入力トークン数における算術精度が対数的である変換器は、定数深さの対数空間一様しきい値回路でシミュレートできることを示す。これは、複雑性理論の既知の結果を用いた変圧器のパワーに関する洞察を与える。
論文参考訳（メタデータ） (2022-07-02T03:49:34Z)
What Dense Graph Do You Need for Self-Attention? [73.82686008622596]
我々はハイパーキューブにおけるトークンインタラクションをモデル化し、バニラ変換器と同等あるいはそれ以上の結果を示すスパーストランスフォーマーHypercube Transformerを提案する。様々なシーケンス長を必要とするタスクの実験は、グラフ関数の検証をうまく行いました。
論文参考訳（メタデータ） (2022-05-27T14:36:55Z)
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding [63.539333383965726]
相対的位置符号化(RPE)を用いた変換器の注意計算を高速化する新しい手法を提案する。相対的な位置符号化がToeplitz行列を形成するという観測に基づいて、Fast Fourier Transform (FFT) を用いて、RPEによるカーネル化された注意を効率的に計算できることを数学的に示す。
論文参考訳（メタデータ） (2021-06-23T17:51:26Z)
$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers [71.31712741938837]
注意層ごとに$O(n)$接続しか持たないスパース変換器は、$n2$接続を持つ高密度モデルと同じ関数クラスを近似できることを示す。また、標準NLPタスクにおいて、異なるパターン・レベルの違いを比較検討する。
論文参考訳（メタデータ） (2020-06-08T18:30:12Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。