Fugu-MT 論文翻訳(概要): Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't

論文の概要: Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't

arxiv url: http://arxiv.org/abs/2605.30523v1
Date: Thu, 28 May 2026 19:58:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-01 20:56:50.206481
Title: Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't
Title（参考訳）: Padded Transformer Expressivityを再考:どのアーキテクチャが重要か、そうでないか
Authors: Anej Svete, William Merrill, Ryan Cotterell, Ashish Sabharwal,
Abstract要約: パッド付き$textL$ constantprecision transformerは、表現性に対して驚くほど堅牢である。 glogd-looped constant-precision transformer reach $textFO-uniform TCd$, growing-precision transformer reach $textFO-uniform TCd$
参考スコア（独自算出の注目度）: 78.36679389968772
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work describes what transformers can and cannot compute through connections to boolean circuits, but existing results lack exact characterizations and are sensitive to modeling choices. Padded transformers -- to whose input filler symbols such as ``...'' are appended -- emerge as a useful gadget for establishing equivalences to circuit classes by providing polynomial space for adaptive parallel computation. However, only a limited set of padded transformer idealizations has been studied, leaving open how robustly these equivalences hold under changes to attention type, model width, and uniformity. We find that, under practical assumptions, padded transformers are surprisingly robust to all of these, and identify numeric precision and model depth as the main factors affecting expressivity. Concretely, we prove that polynomially padded $\text{L-uniform}$ constant-precision transformers are equivalent to $\text{L-uniform AC}^0$, while growing-precision ones achieve $\text{L-uniform TC}^0$ regardless of width. Furthermore, looping enables sequential processing analogous to circuits: $\log^d N$-looped constant-precision transformers reach $\text{FO-uniform AC}^d$, and growing-precision ones reach $\text{FO-uniform TC}^d$. Interestingly, growing width or precision beyond logarithmic does not increase expressivity, and all our results hold for both softmax and average hard attention transformers.
Abstract（参考訳）: 最近の研究では、トランスフォーマーがブール回路への接続を通じて計算できることとできないことを記述しているが、既存の結果には正確な特徴がなく、モデリングの選択に敏感である。適応並列計算のための多項式空間を提供することで、回路クラスに等価性を確立するのに有用な道具として、 '`...' のような入力フィラー記号を付加したパッシッドトランスフォーマーが登場した。しかし、これらの等価性が注意型、モデル幅、均一性の変化の下でどれだけ頑健であるかをオープンに保ちながら、パッドドトランスフォーマーの理想化の限られたセットのみが研究されている。実践的な仮定では、パッド型トランスフォーマーはこれらのすべてに対して驚くほど堅牢であり、数値的精度とモデル深さを表現力に影響を与える主な要因とみなす。具体的には,多項式パッド付き$\text{L-uniform}=定数精度変換器が$\text{L-uniform AC}^0$と等価であるのに対して,成長精度変換器は幅に関係なく$\text{L-uniform TC}^0$が得られることを示す。さらに、ループ化により回路に類似したシーケンシャルな処理が可能となる: $\log^d N$-looped constant-precision transformers reach $\text{FO-uniform AC}^d$, growing-precision transformers reach $\text{FO-uniform TC}^d$。興味深いことに、対数的を超えた幅や精度の増大は表現性を高めず、我々の結果はソフトマックスと平均的ハードアテンショントランスの両方に当てはまる。

論文の概要: Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't

関連論文リスト