Fugu-MT 論文翻訳(概要): Exact Expressive Power of Transformers with Padding

論文の概要: Exact Expressive Power of Transformers with Padding

arxiv url: http://arxiv.org/abs/2505.18948v1
Date: Sun, 25 May 2025 02:52:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-27 16:58:42.773073
Title: Exact Expressive Power of Transformers with Padding
Title（参考訳）: ポーディングを有する変圧器の特殊表現力
Authors: William Merrill, Ashish Sabharwal,
Abstract要約: 長さ$n$の入力に対して$O(logd n)$ループするパッド付き変換器は、適度に並列化可能な問題のクラス$mathsfTCd$を正確に認識する。この結果から, パディングとループのさらなる探索が, 思考の連鎖に対する並列化可能な代替手段として動機づけられた。
参考スコア（独自算出の注目度）: 29.839710738657203
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chain of thought is a natural inference-time method for increasing the computational power of transformer-based large language models (LLMs), but comes at the cost of sequential decoding. Are there more efficient alternatives to expand a transformer's expressive power without adding parameters? We consider transformers with padding tokens as a form of parallelizable test-time compute. We show that averaging-hard-attention, masked-pre-norm transformers with polynomial padding converge to precisely the class $\mathsf{TC}^0$ of extremely parallelizable problems. While the $\mathsf{TC}^0$ upper bound was known, proving a matching lower bound had been elusive. Further, our novel analysis reveals the precise expanded power of padded transformers when coupled with another form of inference-time compute, namely dynamically increasing depth via looping. Our core technical contribution is to show how padding helps bring the notions of complete problems and reductions, which have been a cornerstone of classical complexity theory, to the formal study of transformers. Armed with this new tool, we prove that padded transformers with $O(\log^d n)$ looping on inputs of length $n$ recognize exactly the class $\mathsf{TC}^d$ of moderately parallelizable problems. Thus, padding and looping together systematically expand transformers' expressive power: with polylogarithmic looping, padded transformers converge to the class $\mathsf{NC}$, the best that could be expected without losing parallelism (unless $\mathsf{NC} = \mathsf{P}$). Our results thus motivate further exploration of padding and looping as parallelizable alternatives to chain of thought.
Abstract（参考訳）: 思考の連鎖は、トランスフォーマーベースの大規模言語モデル(LLM)の計算能力を高める自然な推論時間法であるが、逐次デコーディングのコストがかかる。パラメータを追加せずにトランスの表現力を拡大するより効率的な代替手段はあるのだろうか? 我々は、パラレル化可能なテスト時間計算の形式として、パディングトークンを持つトランスフォーマーを考察する。多項式パディングを持つ平均的ハードアテンション、マスク付きプレノーム変換器は、非常に並列化可能な問題のクラス $\mathsf{TC}^0$ に正確に収束することを示す。 $\mathsf{TC}^0$上界は知られていたが、一致する下界の証明は発散していた。さらに,新しい解析により,ループ化により動的に深度を増大させる他の方式の推論時間計算と組み合わせることで,パッド型変圧器の精度が向上することを明らかにした。私たちの中核的な技術的貢献は、パディングが、古典的な複雑性理論の基礎となった完全な問題と縮小の概念をトランスフォーマーの形式研究にどうもたらすかを示すことである。この新しいツールを駆使して、$O(\log^d n)$ループのパッド付き変換器が$n$の入力を正確にクラス $\mathsf{TC}^d$ に並列化可能な問題を認識することを証明した。したがって、パディングとループは変換器の表現力を体系的に拡張する: 多対数ループでは、パッド変換器はクラス $\mathsf{NC}$ に収束する($\mathsf{NC} = \mathsf{P}$ を除いて)。この結果から, パディングとループのさらなる探索が, 思考の連鎖に対する並列化可能な代替手段として動機づけられた。

関連論文リスト

Constant Bit-size Transformers Are Turing Complete [8.38684825915246]
任意の長さの入力で動くチューリングマシンは、定ビットサイズの変圧器でシミュレートできることを示す。提案手法は,チューリング完全計算モデルであるPostマシンのシミュレーションに依存する。
論文参考訳（メタデータ） (2025-05-22T02:45:38Z)
Circuit Complexity Bounds for RoPE-based Transformer Architecture [25.2590541420499]
経験的証拠は、$mathsfRoPE$ベースのTransformerアーキテクチャは、従来のTransformerモデルよりも優れた一般化能力を示していることを示している。我々は、$mathsfTC0 = mathsfNC1$, a $mathsfRoPE$-based Transformer with $mathrmpoly(n)$-precision, $O(1)$ layer, hidden dimension $d leq O(n)$が算術式評価問題を解くことができないことを示す。
論文参考訳（メタデータ） (2024-11-12T07:24:41Z)
On the Role of Depth and Looping for In-Context Learning with Task Diversity [69.4145579827826]
多様なタスクを伴う線形回帰のための文脈内学習について検討する。 We show that multilayer Transformer is not robust to even distributional shifts as $O(e-L)$ in Wasserstein distance。
論文参考訳（メタデータ） (2024-10-29T03:27:56Z)
Can Transformers Learn $n$-gram Language Models? [77.35809823602307]
2種類のランダムな$n$-gram LMを学習するトランスフォーマーの能力について検討する。例えば、$n$-gram LMに対する古典的な推定手法として、add-$lambda$ smoothing outperform transformerがある。
論文参考訳（メタデータ） (2024-10-03T21:21:02Z)
Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers [8.908747084128397]
時間カウントロジックの $textsfK_textt$[#] と RASP の $textsfC-RASP$ を紹介します。それらが互いに等価であることを示し、それらが結合されていない入力サイズを持つ将来のマスキング型ソフトアテンショントランスの形式的表現性に最もよく知られた下界であることを示す。
論文参考訳（メタデータ） (2024-04-05T20:36:30Z)
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems [57.58801785642868]
思考の連鎖(CoT)は、算術や記号的推論タスクにおいて、大きな言語モデル(LLM)の精度を向上させるための非常に効果的な方法である。この研究は、表現性のレンズを通してデコーダのみの変換器に対するCoTのパワーを理論的に理解する。
論文参考訳（メタデータ） (2024-02-20T10:11:03Z)
How do Transformers perform In-Context Autoregressive Learning? [76.18489638049545]
簡単な次のトークン予測タスクでTransformerモデルをトレーニングする。トレーニングされたTransformerが、まず$W$ in-contextを学習し、次に予測マッピングを適用することで、次のトークンを予測する方法を示す。
論文参考訳（メタデータ） (2024-02-08T16:24:44Z)
Transformers Learn Shortcuts to Automata [52.015990420075944]
低深度変換器は任意の有限状態オートマトンを計算できる。我々は,$O(log T)$レイヤを持つ変換器が,長さ$T$の入力シーケンス上で,オートマトンを正確に再現可能であることを示す。さらに、これらの解の脆性について検討し、潜在的な緩和を提案する。
論文参考訳（メタデータ） (2022-10-19T17:45:48Z)
The Parallelism Tradeoff: Limitations of Log-Precision Transformers [29.716269397142973]
入力トークン数における算術精度が対数的である変換器は、定数深さの対数空間一様しきい値回路でシミュレートできることを示す。これは、複雑性理論の既知の結果を用いた変圧器のパワーに関する洞察を与える。
論文参考訳（メタデータ） (2022-07-02T03:49:34Z)
$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers [71.31712741938837]
注意層ごとに$O(n)$接続しか持たないスパース変換器は、$n2$接続を持つ高密度モデルと同じ関数クラスを近似できることを示す。また、標準NLPタスクにおいて、異なるパターン・レベルの違いを比較検討する。
論文参考訳（メタデータ） (2020-06-08T18:30:12Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。