Fugu-MT 論文翻訳(概要): FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models

論文の概要: FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models

arxiv url: http://arxiv.org/abs/2510.27486v1
Date: Fri, 31 Oct 2025 14:04:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-03 17:52:16.122299
Title: FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models
Title（参考訳）: FedAdamW: フェデレートされた大規模モデルに対する収束と一般化保証を備えた通信効率の良い最適化
Authors: Junkang Liu, Fanhua Shang, Kewen Zhu, Hongying Liu, Yuanyuan Liu, Jin Liu,
Abstract要約: AdamWは、大規模なモデルをトレーニングするための最も効果的な一般化の1つになった。我々は、様々な大規模モデルのトレーニングと微調整を行うために、textttFedAdamWと呼ばれる最初のアンダーラインAdamWアルゴリズムを提案する。
参考スコア（独自算出の注目度）: 27.658955798426323
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AdamW has become one of the most effective optimizers for training large-scale models. We have also observed its effectiveness in the context of federated learning (FL). However, directly applying AdamW in federated learning settings poses significant challenges: (1) due to data heterogeneity, AdamW often yields high variance in the second-moment estimate $\boldsymbol{v}$; (2) the local overfitting of AdamW may cause client drift; and (3) Reinitializing moment estimates ($\boldsymbol{v}$, $\boldsymbol{m}$) at each round slows down convergence. To address these challenges, we propose the first \underline{Fed}erated \underline{AdamW} algorithm, called \texttt{FedAdamW}, for training and fine-tuning various large models. \texttt{FedAdamW} aligns local updates with the global update using both a \textbf{local correction mechanism} and decoupled weight decay to mitigate local overfitting. \texttt{FedAdamW} efficiently aggregates the \texttt{mean} of the second-moment estimates to reduce their variance and reinitialize them. Theoretically, we prove that \texttt{FedAdamW} achieves a linear speedup convergence rate of $\mathcal{O}(\sqrt{(L \Delta \sigma_l^2)/(S K R \epsilon^2)}+(L \Delta)/R)$ without \textbf{heterogeneity assumption}, where $S$ is the number of participating clients per round, $K$ is the number of local iterations, and $R$ is the total number of communication rounds. We also employ PAC-Bayesian generalization analysis to explain the effectiveness of decoupled weight decay in local training. Empirically, we validate the effectiveness of \texttt{FedAdamW} on language and vision Transformer models. Compared to several baselines, \texttt{FedAdamW} significantly reduces communication rounds and improves test accuracy. The code is available in https://github.com/junkangLiu0/FedAdamW.
Abstract（参考訳）: AdamWは、大規模なモデルをトレーニングするための最も効果的なオプティマイザの1つになった。また,フェデレートラーニング(FL)の文脈においても,その効果が観察されている。しかし、AdamWを直接フェデレートした学習環境に適用することは、大きな課題を生じさせる: (1) データの不均一性のため、AdamWは、しばしば第二モーメント推定において高い分散をもたらす: $\boldsymbol{v}$; (2)AdamWの局所的なオーバーフィッティングは、クライアントのドリフトを引き起こす可能性がある; 3) 各ラウンドにおけるモーメント推定(\boldsymbol{v}$, $\boldsymbol{m}$)は、各ラウンドにおける収束を遅くする。これらの課題に対処するために、我々は、様々な大規模モデルの訓練と微調整のために、最初の \underline{Fed}erated \underline{AdamW} アルゴリズムである \texttt{FedAdamW} を提案する。 \texttt{FedAdamW} は、局所的な更新を \textbf{local correct mechanism} と decoupled weight decay の両方を使用してグローバルな更新と整合させ、局所的なオーバーフィッティングを緩和する。 texttt{FedAdamW} は、2番目のモーメント推定の \texttt{mean} を効率よく集約し、それらの分散を減らし、再初期化する。理論的には、 \texttt{FedAdamW} が$\mathcal{O}(\sqrt{(L \Delta \sigma_l^2)/(S K R \epsilon^2)}+(L \Delta)/R)$ without \textbf{heterogeneity assumption} ここで、$S$ はラウンド毎の参加クライアント数であり、$K$ はローカルイテレーション数であり、$R$ は通信ラウンドの総数である。また, PAC-Bayesian 一般化解析を用いて, 局所訓練におけるデカップリングウェイト崩壊の有効性について検討した。経験的に,言語および視覚トランスフォーマーモデルにおけるtexttt{FedAdamW}の有効性を検証する。いくつかのベースラインと比較して、 \texttt{FedAdamW} は通信ラウンドを大幅に削減し、テスト精度を向上させる。コードはhttps://github.com/junkangLiu0/FedAdamW.comで入手できる。

論文の概要: FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models

関連論文リスト