Fugu-MT 論文翻訳(概要): Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration

論文の概要: Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration

arxiv url: http://arxiv.org/abs/2605.09034v2
Date: Fri, 15 May 2026 13:21:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 03:45:13.091071
Title: Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration
Title（参考訳）: パワーイテレーションによる部分直交化によるゼロ階スペクトル最適化の高速化
Authors: Jiahe Chen, Ziye Ma,
Abstract要約: 我々は,MuonのようなスペクトルがAdamWより優れている隠蔽層トレーニング問題に焦点を当てた。そのため、Muonの象徴的なNewton-Schulz手順を、より高速でより集中的なパワーイテレーション手法で置き換える。本手法はZO-Muonの収束速度を1.5倍から4倍にすることができる。
参考スコア（独自算出の注目度）: 6.574641780732972
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Zeroth-order (ZO) optimization has become increasingly popular and important in fine-tuning large language models (LLMs), especially on edge devices due to its ability to adjust the model to local data without the need for memory-intensive back-propagation. Recent works try to reduce ZO variance through low-dimensional subspace search, but subspace restriction alone leaves key optimization geometry under-exploited, motivating additional acceleration. In this work, we focus on the hidden layer training problem in which spectral optimizers like Muon outperform AdamW due to its ability to exploit weak spectral directions by orthogonalization. However, we have discovered that unlike in the first-order setting, full orthogonalization works poorly in the ZO setting since the gradient estimates are highly noisy and unreliable. To address this issue, we propose applying partial spectral orthogonalization to accelerate ZO optimization. To do so, we replace the iconic Newton-Schulz procedure in Muon with the faster, more concentrated power-iteration method so that it only amplifies dominant spectral directions. Furthermore, to improve the efficiency and generalization of the algorithm, we adopted a streaming variant of power-iteration that requires low variance in gradients, which was achieved through constraining our search inside a subspace obtained through the projection of momentum, echoing recent advances. Experiments on LLM fine-tuning show that our method can achieve from 1.5x to 4x the convergence speed of ZO-Muon, the current SOTA algorithm, across SuperGlue datasets in the OPT-13B model. Across different models, we also reach competitive final accuracies with less time in most cases compared with strong ZO baselines such as MeZO, LOZO and ZO-Muon. Code is available at https://github.com/MOFA-LAB/ZO-MOPI.git.
Abstract（参考訳）: ゼロオーダー(ZO)最適化は、特にメモリ集約的なバックプロパゲーションを必要とせずに、モデルをローカルデータに調整できるエッジデバイスにおいて、微調整された大規模言語モデル(LLM)において、ますます普及し、重要になっている。近年の研究では、低次元部分空間探索によるZO分散の低減が試みられているが、部分空間制限だけでは、鍵最適化の幾何が未発見であり、さらなる加速を動機付けている。本研究では,Muonのようなスペクトルオプティマイザが直交化によって弱いスペクトル方向を利用する能力によりAdamWより優れる隠蔽層トレーニング問題に焦点をあてる。しかし、一階設定とは異なり、勾配推定は非常にノイズが多く信頼できないため、全直交化はZO設定ではうまく機能しないことがわかった。この問題に対処するために、ZO最適化を高速化するために部分スペクトル直交化を適用することを提案する。そのため、ムオンの象徴的なニュートン・シュルツ法をより高速でより集中的なパワーイテレーション法に置き換え、支配的なスペクトル方向のみを増幅する。さらに,アルゴリズムの効率性と一般化を改善するため,近年の進歩を反映して,モーメントの投影によって得られる部分空間内での探索を制限し,勾配のばらつきを小さくする必要のある,ストリーミング型のパワーイテレーションを採用した。 LLMファインチューニング実験により,現在のSOTAアルゴリズムであるZO-Muonの収束速度を,OPT-13BモデルにおけるSuperGlueデータセットの1.5倍から4倍に向上できることが示された。また,MZO,LOZO,ZO-Muonなどの強力なZOベースラインと比較して,ほとんどの場合,競合する最終精度が低い。コードはhttps://github.com/MOFA-LAB/ZO-MOPI.gitで入手できる。

論文の概要: Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration

関連論文リスト