Fugu-MT 論文翻訳(概要): Rank Collapse, Fixed Points, and the Renormalization Group Structure of MLP Residual Networks

論文の概要: Rank Collapse, Fixed Points, and the Renormalization Group Structure of MLP Residual Networks

arxiv url: http://arxiv.org/abs/2606.10324v1
Date: Tue, 09 Jun 2026 02:19:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-10 15:40:58.266664
Title: Rank Collapse, Fixed Points, and the Renormalization Group Structure of MLP Residual Networks
Title（参考訳）: MLP残差ネットワークのランク崩壊, 固定点, 正規化群構造
Authors: Parviz Haggi-Mani, Irina Rish,
Abstract要約: 類似が抽出可能な最も単純なアーキテクチャについて検討する。残留流の有効ランクはトレーニング後の深度とともに単調に低下する。ネットワークは、予測タスクに関連する自由度を正確に保持する。
参考スコア（独自算出の注目度）: 16.36212521143563
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The analogy between deep neural network forward passes and renormalization group (RG) flows has been repeatedly noted in the literature, but existing treatments remain qualitative: depth is described as a coarse-graining scale, attention is likened to a partition function, and representations are said to flow toward fixed points. No existing work has defined a measurable RG order parameter, tested it under controlled variation of the input distribution, or made quantitative predictions that are empirically verified. We study the simplest architecture for which the analogy is tractable: a pure MLP residual stack trained on masked token prediction over synthetic Markov chain sequences with known spectral properties. We report three findings. (i) The effective rank of the residual stream decreases monotonically with depth after training, consistent with progressive integration of irrelevant degrees of freedom. (ii) This rank collapse is selective: it occurs for chains with short correlation length approximately 1 but is absent for chains with long correlation length approximately 7, measured at the position level to control for mean-pooling artifacts. The network preserves exactly the degrees of freedom relevant to the prediction task, the content of the RG relevance criterion. (iii) Inter-layer kernel drift is concentrated at one or two specific transitions, with the remainder of the network near a fixed point, consistent with a discrete fixed-point plateau. Together these findings constitute the first quantitative, position-level evidence that MLP residual networks implement a selective coarse-graining procedure governed by the spectral structure of the input distribution.
Abstract（参考訳）: ディープ・ニューラル・ネットワーク・フォワードとリノベーション・グループ(RG)フローの類似性は文献で繰り返し指摘されているが、既存の治療は質的であり、ディープ・ニューラル・ネットワーク・フォワードは粗粒度、注意は分割関数に似ており、表現は固定点に向かって流れると言われている。既存の研究では、測定可能なRG順序パラメータを定義したり、入力分布の制御された変動の下でテストしたり、経験的に検証された定量的な予測を行ったりしていない。マスク付きトークン予測に基づいて訓練された純粋なMLP残差スタックを、既知のスペクトル特性を持つ合成マルコフ連鎖列上での単純なアーキテクチャについて検討する。我々は3つの発見を報告した。一残留流の有効位は、訓練後の深度とともに単調に減少し、無関係な自由度の漸進的な統合と整合する。 2) このランク崩壊は, 短相関長約1の鎖に対して発生するが, 位置レベルで測定された長相関長約7の鎖では欠落し, 平均プールアーティファクトを制御する。ネットワークは、予測タスク、RG関連基準の内容に関連する自由度を正確に保持する。 3) 層間カーネルドリフトは1つまたは2つの特定の遷移に集中し、ネットワークの残りの部分は固定点付近で、離散的な固定点台地と整合する。これらの知見は、MLP残差ネットワークが入力分布のスペクトル構造に支配される選択的粗粒化処理を実装していることを示す最初の定量的、位置レベルの証拠である。

論文の概要: Rank Collapse, Fixed Points, and the Renormalization Group Structure of MLP Residual Networks

関連論文リスト