Fugu-MT 論文翻訳(概要): A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization

論文の概要: A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization

arxiv url: http://arxiv.org/abs/2605.18933v1
Date: Mon, 18 May 2026 15:36:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:08.889691
Title: A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization
Title（参考訳）: 第三次量子化下におけるReLU + RMSノームブロックにおける符号-マグニチュード非対称性の幾何学的解析
Authors: Lei Dong,
Abstract要約: RMSNormを許容する3次10,+1重み量子化を持つプレノーム変圧器。重み摂動の符号-マグニチュード分解による幾何学的説明を与える。
参考スコア（独自算出の注目度）: 4.778602479004
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-norm Transformers with RMSNorm tolerate ternary {-1,0,+1} weight quantization with surprisingly small loss (Ma et al., 2024). We give a geometric explanation via sign-magnitude decomposition of weight perturbations. In a two-layer ReLU + RMSNorm model with i.i.d. Gaussian weights, sign-flips produce $π/(π-2) \approx 2.75$ times more transverse output energy than sign-preserving magnitude perturbations of equal Frobenius norm, as the flip rate $p \to 0$ (Theorem 3). The mechanism: ReLU creates a hidden-space directional asymmetry between the two perturbation types, which RMSNorm's transverse-projection Fréchet derivative selectively exposes. Sign-quantization error is itself a sign-preserving perturbation with angular alignment $\cos^2 \to 2/π$ (Theorem 4); its post-ReLU radial fraction ($0.365$) matches the pre-ReLU value $1-2/π$ within $0.4\%$, so ReLU is approximately transparent to ternary error. Multi-layer compounding of the $2.75\times$ factor is not experimentally supported; the gap to real-model sign sensitivity arises from outlier features violating delocalization. For an input dimension with amplitude $α$, a single sign-flip produces post-ReLU energy amplified by $R \approx nα^2$ relative to a delocalized entry. On TinyLlama-1.1B, at linear response ($p \leq 0.5\%$), count-matched NLL leverage stabilizes at $\sim 10\times \approx n\mathbb{E}[α^2]$, matching the per-entry theory; the all-column NLL ratio of $5.0\times$ falls within $R_{\mathrm{col}} \leq 19$ ($67\times$ PPL gap reflects metric nonlinearity). Measured outlier $α$ at layer 12 (median $0.024$, max $0.26$) confirms heavy-tailed concentration. The Bussgang constant $2/π$, RMSNorm geometry, and ReLU half-space structure together explain sign-magnitude asymmetry in pre-norm models, with $R \propto nα^2$ accounting for real-model deviations.
Abstract（参考訳）: RMSNorm を許容する3次 {-1,0,+1} 重み量子化と驚くほど小さな損失を持つプレノーム変換器(Ma et al , 2024)。重み摂動の符号-マグニチュード分解による幾何学的説明を与える。ガウス重みの 2 層 ReLU + RMSNorm モデルでは、フリップレート $p \to 0$ (Theorem 3) として、符号フリップはフロベニウスノルムの符号保存大摂動よりも π/(π-2) \approx 2.75$ の逆出力エネルギーを生成する。メカニズム: ReLUは2つの摂動型の間に隠れた空間方向の非対称性を生成し、RMSNormの横射影フレシェ微分が選択的に露出する。符号量子化誤差はそれ自体、角アライメント$\cos^2 \to 2/π$ (Theorem 4; そのポストReLUラジアル分数$0.365$) の符号保存摂動である。 2.75\times$因子の多層複合化は実験的には支持されないが、実モデル符号感度とのギャップは非局在化を阻害するオフリー特徴から生じる。振幅$α$の入力次元に対して、シングルサインフリップは、非局在化エントリに対して$R \approx nα^2$で増幅されたポストReLUエネルギーを生成する。 TinyLlama-1.1B では、線形応答 (p \leq 0.5\%$) において、カウントマッチング NLL の安定化は $\sim 10\times \approx n\mathbb{E}[α^2]$ と一致する。層12におけるα$の測定値(中間値0.024$、最大値0.26$)は、重尾濃度を確認する。バスガングは定数2/π$、RMSノルム幾何学、ReLU半空間構造を共に説明し、実モデルの偏差を$R \propto nα^2$で説明できる。

論文の概要: A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization

関連論文リスト