Fugu-MT 論文翻訳(概要): Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

論文の概要: Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

arxiv url: http://arxiv.org/abs/2104.02596v4
Date: Fri, 04 Oct 2024 11:38:24 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-06 03:14:47.894208
Title: Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization
Title（参考訳）: 分散最適化のための時間変動グラフによる勾配追従の高速化
Authors: Huan Li, Zhouchen Lin,
Abstract要約: 実用的な単一ループ加速勾配追跡には$O(fracgamma1-sigma_gamma)2sqrtfracLepsilon)$が必要であることを証明している。我々の収束率は$O(frac1epsilon5/7)$と$O(fracLmu)5/7frac1(1-sigma)1.5logfrac1epsilon)$よりも大幅に改善した。
参考スコア（独自算出の注目度）: 59.65871549878937
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decentralized optimization over time-varying graphs has been increasingly common in modern machine learning with massive data stored on millions of mobile devices, such as in federated learning. This paper revisits the widely used accelerated gradient tracking and extends it to time-varying graphs. We prove that the practical single loop accelerated gradient tracking needs $O((\frac{\gamma}{1-\sigma_{\gamma}})^2\sqrt{\frac{L}{\epsilon}})$ and $O((\frac{\gamma}{1-\sigma_{\gamma}})^{1.5}\sqrt{\frac{L}{\mu}}\log\frac{1}{\epsilon})$ iterations to reach an $\epsilon$-optimal solution over time-varying graphs when the problems are nonstrongly convex and strongly convex, respectively, where $\gamma$ and $\sigma_{\gamma}$ are two common constants charactering the network connectivity, $L$ and $\mu$ are the smoothness and strong convexity constants, respectively, and one iteration corresponds to one gradient oracle call and one communication round. Our convergence rates improve significantly over the ones of $O(\frac{1}{\epsilon^{5/7}})$ and $O((\frac{L}{\mu})^{5/7}\frac{1}{(1-\sigma)^{1.5}}\log\frac{1}{\epsilon})$, respectively, which were proved in the original literature of accelerated gradient tracking only for static graphs, where $\frac{\gamma}{1-\sigma_{\gamma}}$ equals $\frac{1}{1-\sigma}$ when the network is time-invariant. When combining with a multiple consensus subroutine, the dependence on the network connectivity constants can be further improved to $O(1)$ and $O(\frac{\gamma}{1-\sigma_{\gamma}})$ for the gradient oracle and communication round complexities, respectively. When the network is static, by employing the Chebyshev acceleration, our complexities exactly match the lower bounds without hiding any poly-logarithmic factor for both nonstrongly convex and strongly convex problems.
Abstract（参考訳）: 時間変動グラフに対する分散最適化は、フェデレーション学習など、数百万のモバイルデバイスに格納された大量のデータを持つ現代の機械学習において、ますます一般的になっている。本稿では、広く使われている加速度勾配追跡を再検討し、それを時間変化グラフに拡張する。実用的な単一ループ加速勾配追跡には$O((\frac{\gamma}{1-\sigma_{\gamma}})^2\sqrt{\frac{L}{\epsilon}})$と$O((\frac{\gamma}{1-\sigma_{\gamma}})^{1.5}\sqrt{\frac{L}{\mu}}\log\frac{1}{\epsilon})$イタレーションが時間変動グラフ上の$\epsilon$-最適解に到達するためには、それぞれ非強凸で強い凸となる。我々の収束速度は$O(\frac{1}{\epsilon^{5/7}})$と$O((\frac{L}{\mu})^{5/7}\frac{1}{(1-\sigma)^{1.5}}\log\frac{1}{\epsilon})$で大きく改善され、これはそれぞれ、静的グラフのみに対して加速勾配追跡のオリジナルの文献で証明され、$\frac {\gamma}{1-\sigma_{\gamma}}$は、ネットワークが時変であるときに$\frac{1}{1-\sigma}$と等しい。複数のコンセンサスサブルーチンと組み合わせると、ネットワーク接続定数への依存はさらに$O(1)$(\frac{\gamma}{1-\sigma_{\gamma}})$(\frac{\gamma}{1-\sigma_{\gamma}})$と$O(\frac{\gamma}{1-\sigma_{\gamma}})$に改善される。ネットワークが静的であるとき、チェビシェフ加速度を用いることで、我々の複素数は非強凸問題と強凸問題の両方に対して、いかなる多対数係数も隠さなくても、下界と正確に一致する。

論文の概要: Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

関連論文リスト