Fugu-MT 論文翻訳(概要): Parity, Sensitivity, and Transformers

論文の概要: Parity, Sensitivity, and Transformers

arxiv url: http://arxiv.org/abs/2602.05896v1
Date: Thu, 05 Feb 2026 17:14:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-06 18:49:09.074856
Title: Parity, Sensitivity, and Transformers
Title（参考訳）: 親性, 感受性, 変圧器
Authors: Alexander Kozachinskiy, Tomasz Steifer, Przemysław Wałȩga,
Abstract要約: ソフトマックス,長さ非依存かつ有界な位置符号化,レイヤノルムを伴わず,因果マスキングを伴わずに動作可能なPARITY変換器を新たに構築する。また、1つのレイヤと1つのヘッドだけでは実行できないことも示しています。
参考スコア（独自算出の注目度）: 47.03592484094856
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The transformer architecture is almost a decade old. Despite that, we still have a limited understanding of what this architecture can or cannot compute. For instance, can a 1-layer transformer solve PARITY -- or more generally -- which kinds of transformers can do it? Known constructions for PARITY have at least 2 layers and employ impractical features: either a length-dependent positional encoding, or hardmax, or layernorm without the regularization parameter, or they are not implementable with causal masking. We give a new construction of a transformer for PARITY with softmax, length-independent and polynomially bounded positional encoding, no layernorm, working both with and without causal masking. We also give the first lower bound for transformers solving PARITY -- by showing that it cannot be done with only one layer and one head.
Abstract（参考訳）: トランスフォーマーアーキテクチャはほぼ10年前です。それでも、このアーキテクチャが何を計算できるのか、あるいはできないのかについては、まだ限定的な理解しかありません。例えば、1層トランスはPARITY(あるいはもっと一般的に)を解けるだろうか? PARITYの既知の構造は、少なくとも2つの層を持ち、非現実的な特徴(長さ依存的な位置符号化、ハードマックス、正規化パラメータを持たない層ノルム、あるいは因果マスキングでは実装できない)を用いる。ソフトマックス,長さ独立および多項式有界な位置符号化,層ノルムを伴わず,因果マスキングを伴わずに動作可能なPARITY変換器を新たに構築する。また、PARITYを解くトランスフォーマーに対して、最初の下位境界を与えます。

論文の概要: Parity, Sensitivity, and Transformers

関連論文リスト