Fugu-MT 論文翻訳(概要): Dead Directions: Geometric Singular Learning

論文の概要: Dead Directions: Geometric Singular Learning

arxiv url: http://arxiv.org/abs/2606.05957v1
Date: Thu, 04 Jun 2026 09:54:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 22:39:44.707227
Title: Dead Directions: Geometric Singular Learning
Title（参考訳）: 死の方向:幾何学的な特異な学習
Authors: Tejas Pradeep Shirodkar,
Abstract要約: 特異学習理論と情報幾何学は、主に別々の語彙で同じパラメータ空間を研究してきた。我々はそれらを1つのプリミティブ、デッド方向、すなわちフィッシャー計量が退化する単位ベクトルを通してブリッジする。滑らかな繊維上の選択規則は、この速度を実対数正準しきい値に対する渡辺の単一方向寄与に変換する。多層K-FAC分解は、各フィッシャーブロックをアクティベーションと勾配側率の積として記述する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Singular learning theory and information geometry have studied the same parameter spaces in mostly separate vocabularies: the former computes Bayesian invariants in resolved coordinates, the latter works in original coordinates under a non-degeneracy assumption that overparameterised models routinely violate. We bridge them through one primitive, the dead direction: a unit vector along which the Fisher metric degenerates, equivalently a tangent to the analytic singular set with a definite KL order, set by how fast the KL divergence vanishes. The two readings name the same vector; our central move shows its KL order is recoverable as the decay rate of the directional Fisher curvature approaching the singularity, in original parameter coordinates and without a Hironaka resolution. A selection rule on smooth fibres translates this rate into Watanabe's single-direction contribution to the real log canonical threshold, and we extend the recovery to multi-component crossings, multiplicity $m$, the singular fluctuation $ν$ (universal in the KL order for 1D directions), prior-RLCT shifts, and tempered posteriors. We then lift this rate to a deep network: a multi-layer K-FAC factorisation writes each Fisher block as a product of activation- and gradient-side rates with a duality between them, instantiated at modern-network primitives (residual streams, layer normalisation, attention). A quotient theorem carries the rate to the gauge quotient $Θ/G$ under gradient flow on a $G$-invariant metric; SGD qualifies, standard Adam does not, and we construct a $G$-equivariant Adam-family preconditioner (DDCAdam) that does. The bridge yields a parameter-coordinate handle on singular geometry, closed-form per-architecture predictions, and a trajectory-rate readout of Watanabe's triple $(λ, m, ν)$ from one checkpoint's forward and backward passes, without posterior sampling.
Abstract（参考訳）: 正則学習理論と情報幾何学は、主に別の語彙で同じパラメータ空間を研究してきた: 前者は分解座標におけるベイズ不変量を計算し、後者は非縮退的仮定の下で、過度にパラメータ化されたモデルが規則的に違反するという元の座標で機能する。フィッシャー計量が退化する単位ベクトル、同値に、KL の発散の速度によって設定された定値な KL 次数を持つ解析特異集合への接点である。我々の中心運動は、原パラメータ座標において、その特異点に近づく方向フィッシャー曲率の崩壊速度として、広中分解能のないKL位を回復可能であることを示している。滑らかな繊維上の選択規則は、この速度を実対数標準しきい値に対する渡辺の単一方向寄与に変換し、回復を多成分交差、多重度$m$、特異揺らぎ$ν$(KL方向の1次元方向のユニバーサル)、先行RLCTシフト、テーパー後部まで拡張する。マルチレイヤのK-FAC因子化は、各フィッシャーブロックを活性化と勾配の2倍率の積として記述し、現代のネットワークプリミティブ(残留ストリーム、層正規化、注意)でインスタンス化する。商定理(英: quotient theorem)は、G$不変計量上での勾配流下でのゲージ商$(英語版)/G$(英語版)への速度を持ち、SGDは、標準アダムが不等式を定め、標準アダムは不等式であり、我々は、それを行う$G$同変アダム-ファミリープレコンディショナー(DDCAdam)を構築する。この橋は、特異幾何学上のパラメータ座標ハンドル、構造毎の閉形式予測、そして1つのチェックポイントの前方および後方通過からワタナベのトリプル$(λ, m, ν)$の軌道速度の読み出しを後方サンプリングなしで得る。

論文の概要: Dead Directions: Geometric Singular Learning

関連論文リスト