Fugu-MT 論文翻訳(概要): AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

論文の概要: AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

arxiv url: http://arxiv.org/abs/2605.00650v1
Date: Fri, 01 May 2026 13:31:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 17:43:28.969138
Title: AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments
Title（参考訳）: AdaMeZO: モーメントを維持せずにLPM微調整のためのアダム式ゼロ階最適化器
Authors: Zhijie Cai, Haolong Chen, Guangxu Zhu,
Abstract要約: MeZOはコンバージェンスを遅くするコストで、フォワードパスからファインチューンLSMにのみ依存する。 AdaMeZOは,メモリに保持することなく,第1および第2のモーメント推定法である。 AdaMeZOは、最大70%のフォワードパスを必要としながら、MeZOより優れていることを示す。
参考スコア（独自算出の注目度）: 14.312248574936874
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-tuning LLMs is necessary for various dedicated downstream tasks, but classic backpropagation-based fine-tuning methods require substantial GPU memory. To this end, a recent work, MeZO, which relies solely on forward passes to fine-tune LLMs, significantly reduces GPU requirements at the cost of slower convergence due to its indifference to loss landscapes. Standard solutions, such as Adam, explore loss landscapes by estimating the first- and second-order moments and storing them in memory to guide the model's movement through dimensions with lower curvature and vice versa. However, directly applying Adam negates MeZO's advantage as it will triple the memory requirement. In light of this, we propose AdaMeZO, a zeroth-order optimizer that leverages Adam-style first- and second-moment estimates without maintaining them in memory. We present a theoretical analysis of AdaMeZO, corroborated by extensive experiments demonstrating AdaMeZO's performance, showing that AdaMeZO can outperform MeZO while requiring up to $70\%$ fewer forward passes. Trajectory visualizations affirm AdaMeZO's ability to adapt to diverse loss landscapes.
Abstract（参考訳）: 様々なダウンストリームタスクには微調整のLLMが必要であるが、古典的なバックプロパゲーションベースの微調整手法には相当なGPUメモリが必要である。この目的のために、最近の研究であるMeZOは、フォワードパスのみを微調整LDMに依存しており、損失ランドスケープへの無関心のために収束を遅くするコストでGPU要求を著しく削減している。アダムのような標準的な解決策は、1階と2階のモーメントを推定し、それらを記憶に保存し、低い曲率の次元を通してモデルの運動を誘導し、その逆を導くことで損失の風景を探索する。しかし、Adamはメモリ要求を3倍にすることでMeZOの利点を否定する。そこで我々はAdaMeZOを提案する。AdaMeZOはAdamスタイルの第1と第2のモーメント推定をメモリに保持することなく活用するゼロ階最適化器である。本稿では,AdaMeZOの性能を実証する広範囲な実験によって得られたAdaMeZOの理論解析から,AdaMeZOは最大70 %のフォワードパスを必要としながら,MeZOより優れていることを示す。軌道可視化は、AdaMeZOの多様な失われた風景に適応する能力を確認している。

論文の概要: AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

関連論文リスト