Fugu-MT 論文翻訳(概要): Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

論文の概要: Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

arxiv url: http://arxiv.org/abs/2606.14187v2
Date: Tue, 16 Jun 2026 11:02:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-17 15:01:46.635895
Title: Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning
Title（参考訳）: Zeta: Coordinate-Adaptive Preconditioningによる行列最適化のためのデュアルホワイトニング
Authors: Kaiwen Chen, Shuhai Zhang, Zimo Liu, Linxiao Li, Ying Sun, Yuchen Li, Yifan Zhang, Bo Han, Mingkui Tan, Qiuwu Chen,
Abstract要約: 我々は、厳密に順序付けられたパイプラインで座標白化とスペクトル白化を施した二重白化であるゼータを提案する。我々はZetaが言語モデリング(0.6Bから8Bパラメータ)、Mix-of-expertsアーキテクチャ、ビジョンタスクにまたがる強力なベースラインに適合しているか、あるいは超越していることを証明する。
参考スコア（独自算出の注目度）: 56.24532075189964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large-scale neural network training increasingly relies on matrix-aware optimizers that exploit the structure of weight parameters beyond element-wise adaptation. However, existing matrix-aware methods such as Muon have an underappreciated vulnerability: their core operation, Newton-Schulz iteration, depends critically on input conditioning, yet the raw momentum matrices exhibit severe coordinate-wise scale heterogeneity. In this paper, we first verify this scale heterogeneity through a chi-square uniformity test, showing that intra-matrix scale imbalance is prevalent across Transformer layers and that coordinate whitening effectively corrects it. Motivated by this finding, we propose Zeta, a dual whitening optimizer that applies coordinate whitening and spectral whitening in a strictly ordered pipeline. The ordering is not a tunable choice but follows from a mathematical dependency: coordinate whitening establishes the statistical isotropy that spectral whitening requires to function reliably. We further prove that this dual pipeline strictly reduces orthogonalization error relative to pure spectral methods by improving the condition number of the input. Empirically, Zeta matches or surpasses strong baselines across language modeling (0.6B to 8B parameters), mixture-of-experts architectures, and vision tasks, demonstrating that resolving scale imbalance before orthogonalization leads to faster convergence and better generalization. Code is available at https://github.com/AIGCodeOS/aigcode_zeta_optimizer.
Abstract（参考訳）: 大規模ニューラルネットワークトレーニングは、要素適応以上の重みパラメータの構造を利用する行列対応オプティマイザにますます依存している。しかし、Muonのような既存の行列認識手法では、コア演算であるNewton-Schulz反復は入力条件に大きく依存するが、原運動量行列は厳密な座標ワイドスケールの不均一性を示す。本稿では, このスケールの不均一性をカイ二乗法により検証し, 行列内スケール不均衡がトランスフォーマー層間で有効であり, 座標白化が効果的に補正可能であることを示す。この発見に触発されて、厳密に順序付けられたパイプラインで座標白化とスペクトル白化を施した二重白化オプティマイザであるZetaを提案する。座標ホワイトニングは、スペクトルホワイトニングが確実に機能するために必要な統計的等方性を確立する。さらに、この二重パイプラインは入力の条件数を改善することにより、純粋なスペクトル法と比較して直交誤差を厳密に低減することを示す。実証的には、ゼータは言語モデリング(0.6Bから8Bのパラメータ)、Mix-of-expertsアーキテクチャ、ビジョンタスクにまたがる強力なベースラインに適合し、直交化前のスケール不均衡の解消がより高速な収束とより優れた一般化をもたらすことを示した。コードはhttps://github.com/AIGCodeOS/aigcode_zeta_optimizerで入手できる。

論文の概要: Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

関連論文リスト