Fugu-MT 論文翻訳(概要): Beyond Neural Collapse: Task-Intrinsic Geometry Governs Neural Representations in Modular Arithmetic

論文の概要: Beyond Neural Collapse: Task-Intrinsic Geometry Governs Neural Representations in Modular Arithmetic

arxiv url: http://arxiv.org/abs/2606.08985v1
Date: Mon, 08 Jun 2026 03:30:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.682209
Title: Beyond Neural Collapse: Task-Intrinsic Geometry Governs Neural Representations in Modular Arithmetic
Title（参考訳）: ニューラル崩壊を超えて: モジュラー算術におけるタスク固有の幾何学の神経表現
Authors: Hu Tan, Kuo Gai, Shihua Zhang,
Abstract要約: 単純 ETF はクロスエントロピーにおいてわずか$O(1)$の利点しか得られないのに対し、巡回ランク 2 の解はシャッテンあるいはウェイトデカイサロゲートの下で$(K)$の利点を享受する。この結果から,モジュラー算術のグラクキングは最大分離のみではなく,分離,対称性,複雑性の間のタスク構造的トレードオフによって制御されることがわかった。
参考スコア（独自算出の注目度）: 18.72807692009739
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While neural collapse (NC) predicts that a $K$-class-balanced classifier should organize terminal representations as a $(K-1)$-dimensional simplex equiangular tight frame (ETF), modular addition consistently enters a different regime: networks compress to a two-dimensional cyclic geometry in which both classifier weights and token embeddings lie on circles. We refine the explanation of this phenomenon in three directions. First, we formalize a layerwise non-uniform training mechanism: downstream classifier weights are driven by dense cross-entropy gradients into a rank-2 equiangular configuration before upstream embeddings fully reorganize, and once this classifier plane forms, backpropagated feature gradients constrain embedding motion to the same plane while weight decay suppresses orthogonal components. Second, after this subspace locking, the induced in-plane dynamics admit an entropy-regularized transport interpretation on $S^1$; combined with modular-addition labels, this reduces embedding formation to phase alignment, whose minimizers are single-frequency characters of $\mathbb{Z}/P\mathbb{Z}$ and hence equal-angle points on a circle. Third, we quantify why this solution prevails over NC: a simplex ETF gains only an $O(1)$ advantage in cross-entropy, whereas the cyclic rank-2 solution enjoys a $Θ(K)$ advantage under Schatten or weight-decay surrogates, yielding a critical threshold $λ_{\mathrm{crit}} = Θ(1/K)$. Our results explain both why classifier weights move first and why embeddings subsequently align with them, showing that grokking on modular arithmetic is governed not by maximal separation alone but by a task-structured trade-off between separation, symmetry, and complexity.
Abstract（参考訳）: 神経崩壊(NC)は、$K$クラスバランスの分類器が終端表現を$(K-1)$-dimensional simplex equiangular tight frame (ETF)として整理するべきであると予測する一方で、モジュラー加算は一貫して異なる状態に入る:ネットワークは、分類器の重みとトークン埋め込みの両方が円上に置かれる2次元の循環幾何学に圧縮する。我々はこの現象の説明を3方向に洗練する。まず,階層的に非一様学習機構を定式化する:下流の分類器の重みは,上流の埋め込みが完全に再編成される前に,高密度なクロスエントロピー勾配によりランク2の等角な構成に駆動され,この分類器平面が形成されると,逆伝播特性勾配が同一平面への埋め込み動作を制限し,重みの減衰は直交成分を抑制する。第二に、この部分空間ロックの後、誘導平面力学は$S^1$のエントロピー規則化された輸送解釈を許容し、モジュラー加法ラベルと組み合わせることで、最小値が$\mathbb{Z}/P\mathbb{Z}$の単一周波数文字である位相アライメントへの埋め込み形成を減少させ、したがって円上の等角点を減少させる。第3に、この解がNC上で優位である理由を定量化する: 単純 ETF はクロスエントロピーにおいてわずか$O(1)$の利得しか得られず、一方巡回ランク2 の解は、シャッテンやウェイト・デカイ・サロゲートの下で$(K)$の利得を享受し、臨界しきい値 $λ_{\mathrm{crit}} = λ(1/K)$ が得られる。以上の結果から,なぜ分類器の重みが最初に動くのか,なぜ埋め込みがそれに沿って動くのかが説明され,モジュラー算術のグラクキングは最大分離のみではなく,分離,対称性,複雑性の間のタスク構造化トレードオフによって制御されることを示す。

論文の概要: Beyond Neural Collapse: Task-Intrinsic Geometry Governs Neural Representations in Modular Arithmetic

関連論文リスト