Fugu-MT 論文翻訳(概要): The Geometry of Projection Heads: Conditioning, Invariance, and Collapse

論文の概要: The Geometry of Projection Heads: Conditioning, Invariance, and Collapse

arxiv url: http://arxiv.org/abs/2605.17180v1
Date: Sat, 16 May 2026 22:32:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:47.725079
Title: The Geometry of Projection Heads: Conditioning, Invariance, and Collapse
Title（参考訳）: 射影頭部の幾何学:条件, 不変性, 崩壊
Authors: Faris Chaudhry,
Abstract要約: 自己教師型学習における投影頭部の幾何学的理論を開発する。線形ヘッドが暗黙的な部分空間白化を行うのに対し、非線形ヘッドは局所的なメトリクスを適応させ、損失の特定の位相的制約を満たすことを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We develop a geometric theory of projection heads in self-supervised learning by modeling the head as a trainable Riemannian metric on the backbone representation manifold. We show that linear heads perform implicit subspace whitening, while nonlinear heads adapt local metrics to satisfy the specific topological constraints of the loss, with head depth empirically dictating this capacity. Analyzing dimensional collapse, we prove that smooth nonlinear heads natively induce negative eigenvalues in the Hessian at collapsed equilibria, making them unstable. We empirically validate this by continuously tracking the optimization geometry during training, which reveals that smooth activations like Swish can generate explicit negative curvature to escape collapse, whereas linear and ReLU heads under continuous-time gradient flow cannot, relying instead on discrete-time optimization dynamics and BatchNorm. Finally, we geometrically characterize how metric degeneracy governs the information-invariance trade-off, explaining why the head must be discarded. Evaluated across contrastive and decorrelation-based objectives on foundation models, our results demonstrate that the projection head acts as a universal geometric buffer, decoupling the semantic backbone from the rigid, destructive constraints of the pretraining objective.
Abstract（参考訳）: バックボーン表現多様体上の訓練可能なリーマン計量として頭部をモデル化することにより、自己教師付き学習における射影ヘッドの幾何学的理論を開発する。線形ヘッドが暗黙的な部分空間白化を行うのに対し、非線形ヘッドは局所的なメトリクスを適応させて損失の特定の位相的制約を満たすことを示し、頭部深度はこの容量を経験的に予測する。次元崩壊を解析し、滑らかな非線形ヘッドが崩壊平衡においてヘッセンの負の固有値を自然に誘導し不安定にすることを示した。この結果から,Swishのようなスムーズなアクティベーションが明確な負の曲率を発生して崩壊を回避できるのに対して,連続時間勾配流下での線形およびReLUヘッドは,離散時間最適化力学やBatchNormに依存しないことがわかった。最後に、なぜ頭部を捨てなければならないのかを説明するため、計量縮退が情報不変のトレードオフをどのように支配するかを幾何学的に特徴づける。基礎モデル上での相対的および非相関性に基づく目的に対して評価し、投影ヘッドが普遍的な幾何学的バッファとして機能し、前訓練対象の厳密で破壊的な制約から意味的バックボーンを分離することを示した。

論文の概要: The Geometry of Projection Heads: Conditioning, Invariance, and Collapse

関連論文リスト