Fugu-MT 論文翻訳(概要): The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

論文の概要: The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

arxiv url: http://arxiv.org/abs/2605.18079v1
Date: Mon, 18 May 2026 08:57:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:49.209488
Title: The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought
Title（参考訳）: 低精度ソフトマックス変圧器の(要約)整合力
Authors: Moritz Brösamle, Stephan Eckstein,
Abstract要約: 変圧器の既存の表現結果は通常、ハードマックスの注意、高精度、その他のアーキテクチャの変更に依存し、実際に使用されるモデルからそれらを切り離す。我々は,このギャップを,トランスフォーマーデコーダをソフトマックスアテンションで解析し,アクティベーションとアテンションウェイトを丸め,深さと幅をコンテキスト長と対数的に成長させることで橋渡しする。中間段階として、3次アクティベーションを持つハードマックストランスフォーマーと、Chain-of-Thought (CoT) を用いたチューリングマシンをシミュレートした注意スコアを構築する。
参考スコア（独自算出の注目度）: 1.9193579706947885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing expressivity results for transformers typically rely on hardmax attention, high precision, and other architectural modifications that disconnect them from the models used in practice. We bridge this gap by analyzing standard transformer decoders with softmax attention and rounding of activations and attention weights, while allowing depth and width to grow logarithmically with the context length. As an intermediate step, we construct hardmax transformers with ternary activations and well-separated attention scores that simulate Turing machines using Chain-of-Thought (CoT). This lets us convert the constructions to equivalent softmax transformers without the unrealistic parameter magnitudes or activation precision that prior approaches would require. Using the same technique, we analyze a recently proposed summarized CoT paradigm and show that it simulates Turing machines more efficiently, with model size scaling logarithmically in a space bound rather than a time bound. We empirically test predictions made by our results on a Sudoku reasoning task and find better alignment with learnability than for prior high-precision results. Our code is available at https://github.com/moritzbroe/transformer-expressivity.
Abstract（参考訳）: 変圧器の既存の表現結果は通常、ハードマックスの注意、高精度、その他のアーキテクチャの変更に依存し、実際に使用されるモデルからそれらを切り離す。我々は,標準変圧器デコーダをソフトマックスアテンションで解析し,アクティベーションとアテンションウェイトを丸めながら,深さと幅をコンテキスト長と対数的に成長させることにより,このギャップを橋渡しする。中間段階として、3つのアクティベーションを持つハードマックストランスフォーマーと、Chain-of-Thought (CoT) を用いたチューリングマシンをシミュレートした注意スコアを構築する。これにより、以前のアプローチが必要とする非現実的なパラメータサイズやアクティベーション精度を使わずに、構成を等価なソフトマックス変換器に変換することができる。同じ手法を用いて、最近提案されたCoTパラダイムを解析し、時間境界ではなく空間に対数的にスケールするモデルサイズでチューリングマシンをより効率的にシミュレートすることを示した。本研究では,スドク推論課題における結果から得られた予測を実証的に検証し,事前の高精度な結果よりも学習可能性との整合性が良好であることを示す。私たちのコードはhttps://github.com/moritzbroe/transformer- expressivity.comで利用可能です。

論文の概要: The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

関連論文リスト