Fugu-MT 論文翻訳(概要): Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity

論文の概要: Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity

arxiv url: http://arxiv.org/abs/2506.03337v1
Date: Tue, 03 Jun 2025 19:29:50 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-05 21:20:14.02635
Title: Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity
Title（参考訳）: 転写性スペーサを有する無段フェデレーションLDMファインチューニングにおける非IIDドリフトの緩和
Authors: Yide Ran, Wentao Guo, Jingwei Sun, Yanzhou Pan, Xiaodong Yu, Hao Wang, Jianwen Xie, Yiran Chen, Denghui Zhang, Zhaozhuo Xu,
Abstract要約: フェデレートラーニング(Federated Learning)は、分散化された非独立型クライアントと独立型分散型クライアント(Non-IID)をまたいだ、大規模言語モデル(LLM)の協調的な微調整を可能にする。 Meerkat は、フェデレート LLM ファインチューニング用に設計されたスパースゼロ階最適化 (ZO) 手法である。 Meerkatは優れた通信効率を実現し、コスト効率の高い高周波同期を実現する。
参考スコア（独自算出の注目度）: 30.075631058793466
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated Learning enables collaborative fine-tuning of Large Language Models (LLMs) across decentralized Non-Independent and Identically Distributed (Non-IID) clients, but such models' massive parameter sizes lead to significant memory and communication challenges. This work introduces Meerkat, a sparse zeroth-order optimization (ZO) method designed for federated LLM fine-tuning. By limiting fine-tuning to a transferable, static, extremely sparse subset of parameters, Meerkat achieves remarkable communication efficiency, enabling cost-effective high-frequency synchronization. With theoretical analysis and experiments, we show that this high-frequency communication effectively mitigates Non-IID data challenges and leads to superior performance compared to full-parameter ZO. Furthermore, experiment results show that Meerkat outperforms existing sparsity baselines with better performance at the same communication frequency. To further handle Non-IID drift, Meerkat leverages traceable local updates and forms a virtual path for each client. This virtual path mechanism reveals the GradIP phenomenon: the inner products between LLM pre-training gradients maintained by server and client gradients estimated via ZO converges for extreme Non-IID clients but oscillates for IID ones. This distinct behavior provides a signal for identifying clients with extreme data heterogeneity. Using this signal, Meerkat-vp is proposed to analyze GradIP trajectories to identify extreme Non-IID clients and applies early stopping to enhance aggregated model quality. Experiments confirm that Meerkat and Meerkat-vp significantly improve the efficiency and effectiveness of ZO federated LLM fine-tuning.
Abstract（参考訳）: フェデレートラーニング(Federated Learning)は、分散化された非独立性および独立性のある分散(Non-IID)クライアントをまたいだ大規模言語モデル(LLM)の協調的な微調整を可能にする。本研究は, LLMファインチューニング用に設計された疎ゼロ階最適化(ZO)手法であるMeerkatを紹介する。パラメータの転送可能で静的で非常にスパースなサブセットに微調整を制限することにより、Meerkatは驚くべき通信効率を実現し、コスト効率の高い高周波同期を可能にする。理論的解析と実験により、この高周波通信は非IIDデータ課題を効果的に軽減し、全パラメータZOよりも優れた性能をもたらすことを示す。さらに,実験結果から,Meerkatは通信周波数が同じである場合,既存の疎度ベースラインよりも優れた性能を示すことが示された。非IIDドリフトをさらに処理するために、Meerkatはトレース可能なローカル更新を活用し、各クライアントの仮想パスを形成する。この仮想経路機構はGradIP現象を呈する: LLM事前学習勾配とZOによって推定されるクライアント勾配との間の内部積は、極端な非IIDクライアントに対しては収束するが、IDIでは発振する。この異なる振る舞いは、クライアントを極端なデータ不均一性で識別するための信号を提供する。この信号を用いて、Meerkat-vpはGradIPトラジェクトリを分析し、極端な非IIDクライアントを識別し、初期停止を適用して集約されたモデル品質を向上させる。実験により, Merkat と Meerkat-vp が ZO フェデレート LLM 微調整の効率と効果を著しく向上することが確認された。

論文の概要: Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity

関連論文リスト