Fugu-MT 論文翻訳(概要): Transformer Injectivity & Geometric Robustness - Analytic Margins and Bi-Lipschitz Uniformity of Sequence-Level Hidden States

論文の概要: Transformer Injectivity & Geometric Robustness - Analytic Margins and Bi-Lipschitz Uniformity of Sequence-Level Hidden States

arxiv url: http://arxiv.org/abs/2511.14808v1
Date: Mon, 17 Nov 2025 19:39:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-20 15:51:28.473582
Title: Transformer Injectivity & Geometric Robustness - Analytic Margins and Bi-Lipschitz Uniformity of Sequence-Level Hidden States
Title（参考訳）: 変圧器のインジェクティビティと幾何学的ロバストネス -解析マージンとシーエンスレベル隠れ状態のBi-Lipschitz均一性-
Authors: Mikael von Strauss,
Abstract要約: 離散的プロンプトから終点隠れ状態への写像は、有限プロンプト集合に対して一般射影的であることを示す。本研究では, 層間, シーケンス長, モデルスケール, 8ビットおよび4ビットのアクティベーション量子化について検討した。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Under real-analytic assumptions on decoder-only Transformers, recent work shows that the map from discrete prompts to last-token hidden states is generically injective on finite prompt sets. We refine this picture: for each layer $\ell$ we define a collision discriminant $Δ^\ell \subset Θ$ and injective stratum $U^\ell = Θ\setminus Δ^\ell$, and prove a dichotomy -- either the model is nowhere injective on the set, or $U^\ell$ is open and dense and every $F^\ell_θ$ is injective. Under mild non-singularity assumptions on the optimizer and an absolutely continuous initialization, generic injectivity persists along smooth training trajectories over any fixed horizon. We also treat symmetry groups $G$, showing that discriminants and injective strata descend to the quotient $Θ/G$, so injectivity is naturally a property of functional equivalence classes. We complement these results with an empirical study of layerwise geometric diagnostics. We define a separation margin and a co-Lipschitz (lower Lipschitz) constant between prompt space and last-token representation space, estimated via nearest-neighbor statistics on large prompt sets. Applying these diagnostics to pretrained LLaMA-3 and Qwen models, we study behavior across layers, sequence lengths, model scales, and 8- and 4-bit activation quantization. On our sampled prompts we see no collisions in full precision or at 8 bits, while 4-bit quantization induces a small number of collisions and markedly shrinks co-Lipschitz estimates. For a small GPT-2 trained from scratch, normalized metrics remain stable over training. Overall, the results suggest that Transformer representations are generically and persistently injective in the continuous-parameter idealization, while their practical invertibility can be probed using simple geometric diagnostics.
Abstract（参考訳）: デコーダのみのトランスフォーマーに関する実解析的な仮定の下で、最近の研究は、離散的なプロンプトから最後の隠れ状態への写像が有限プロンプト集合に対して一般射影的であることを示している。それぞれの層に対して、$\ell$ は衝突判別式 $Δ^\ell \subset >$ と injective stratum $U^\ell = >\setminus Δ^\ell$ を定義し、二分法を証明する。最適化器上の穏やかな非特異性仮定と絶対連続な初期化の下では、ジェネリック・インジェクティビティは任意の固定地平線上の滑らかな訓練軌道に沿って持続する。また、対称群 $G$ も扱い、判別式と射影層が商 $ s/G$ に下降することを示すので、単射性は自然に関数同値類の性質である。これらの結果を,層状幾何学的診断の実証的研究で補完する。我々は、大きなプロンプト集合上の最も近い近傍統計量から推定される、プロンプト空間と最後のトーケン表現空間の間の分離マージンとコ・リプシッツ(より低いリプシッツ)定数を定義する。これらの診断を事前訓練したLLaMA-3およびQwenモデルに適用し、層、配列長、モデルスケール、および8ビットおよび4ビットのアクティベーション量子化について検討する。サンプル化されたプロンプトでは、完全な精度や8ビットの衝突は見られず、4ビットの量子化は少数の衝突を誘発し、コ・リプシッツ推定を著しく縮小する。スクラッチからトレーニングされた小さなGPT-2では、正規化メトリクスはトレーニング中に安定している。その結果,Transformer表現は連続パラメータの理想化において汎用的かつ永続的に注入され,その実用的可逆性は単純な幾何学的診断法を用いて探究可能であることが示唆された。

論文の概要: Transformer Injectivity & Geometric Robustness - Analytic Margins and Bi-Lipschitz Uniformity of Sequence-Level Hidden States

関連論文リスト