Fugu-MT 論文翻訳(概要): ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

論文の概要: ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

arxiv url: http://arxiv.org/abs/2202.09357v2
Date: Fri, 24 Mar 2023 11:39:47 GMT
ステータス: 翻訳完了
システム内更新日: 2023-03-27 19:02:08.905305
Title: ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!
Title（参考訳）: ProxSkip: はい。ローカルなグラディエントステップはおそらく通信加速につながる! ついに!
Authors: Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich and Peter Richt\'arik
Abstract要約: ProxSkipは、スムーズな(f$)関数と高価な非滑らかな(psi$)関数の和を最小化する驚くほどシンプルで、証明可能な効率のよい方法である。我々の主な動機は、勾配演算子の評価が、すべてのデバイスで独立して局所的なGDステップを取ることに対応する、連合学習にある。
参考スコア（独自算出の注目度）: 6.170898159041277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce ProxSkip -- a surprisingly simple and provably efficient method for minimizing the sum of a smooth ($f$) and an expensive nonsmooth proximable ($\psi$) function. The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration. In this work we are specifically interested in the regime in which the evaluation of prox is costly relative to the evaluation of the gradient, which is the case in many applications. ProxSkip allows for the expensive prox operator to be skipped in most iterations: while its iteration complexity is $\mathcal{O}\left(\kappa \log \frac{1}{\varepsilon}\right)$, where $\kappa$ is the condition number of $f$, the number of prox evaluations is $\mathcal{O}\left(\sqrt{\kappa} \log \frac{1}{\varepsilon}\right)$ only. Our main motivation comes from federated learning, where evaluation of the gradient operator corresponds to taking a local GD step independently on all devices, and evaluation of prox corresponds to (expensive) communication in the form of gradient averaging. In this context, ProxSkip offers an effective acceleration of communication complexity. Unlike other local gradient-type methods, such as FedAvg, SCAFFOLD, S-Local-GD and FedLin, whose theoretical communication complexity is worse than, or at best matching, that of vanilla GD in the heterogeneous data regime, we obtain a provable and large improvement without any heterogeneity-bounding assumptions.
Abstract（参考訳）: ProxSkipは、スムーズな(f$)関数と高価な非滑らかな(\psi$)関数の和を最小化する驚くほどシンプルで、証明可能な効率のよい方法です。このような問題を解決するための標準的なアプローチは、各反復において$f$の勾配と$\psi$のプロキシ演算子の評価に基づいて、近勾配降下法(ProxGD)アルゴリズムである。本研究で特に注目しているのは,proxの評価が勾配の評価に比較して費用がかかるようなシステムであり,多くの応用例においてそうである。 proxskipは高価なprox演算子をほとんどのイテレーションでスキップできる: イテレーションの複雑さは$\mathcal{o}\left(\kappa \log \frac{1}{\varepsilon}\right)$であり、ここで$\kappa$は$f$の条件番号であるが、proxの評価の数は$\mathcal{o}\left(\sqrt{\kappa} \log \frac{1}{\varepsilon}\right)$のみである。我々の主な動機は、勾配演算子の評価がすべてのデバイスで独立に局所的なGDステップをとることに対応し、プロキシの評価は勾配平均化の形式での(拡張的な)コミュニケーションに対応することにある。この文脈では、ProxSkipは通信複雑性の効果的な加速を提供する。 fedavg, scaffold,s-local-gd,fedlinなどの他の局所勾配型手法とは異なり,不均質なデータレジームにおけるバニラgdのそれよりも理論的な通信複雑性が悪く,あるいは最善の一致があるため,不均質性境界を仮定せずに証明可能で大きな改善が得られる。

関連論文リスト

ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
本稿では,BFG凸最適化(SCO: Reweighted Query (ReSQue) 推定ツールを提案する。我々はSCOの並列およびプライベート設定における最先端の複雑さを実現するアルゴリズムを開発した。
論文参考訳（メタデータ） (2023-01-01T18:51:29Z)
Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity [15.18055488087588]
上記の凸定式化を$widetildeO(sum_i=1n d_i log (1 /epsilon))$グラデーション計算で$epsilon$-accuracyに最小化するアルゴリズムを与える。我々の主な技術的貢献は、カットプレーン法とインテリアポイント法を組み合わせた新しい組み合わせにより、各イテレーションで$f_i$項を選択する適応的な手順である。
論文参考訳（メタデータ） (2022-08-07T20:53:42Z)
DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization [0.0]
DADAOは、L$-smooth と $mu$-strongly convex 関数の和を最小化する最初の分散化、高速化、非同期化、プライマリ化、一階述語アルゴリズムである。我々のアルゴリズムは、$mathcalO(nsqrtchisqrtfracLmulog(frac1epsilon)$ localと$mathcalO(nsqrtchisqrtfracLmulog()のみを必要とすることを示す。
論文参考訳（メタデータ） (2022-07-26T08:47:54Z)
Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity [121.83085611327654]
積 $r:=p + q$, ここで$r$は$mu$-strong convex類似性である。エージェントの通信やローカルコールにマスターされた問題を解決する方法を提案する。提案手法は$mathcalO(sqrtL_q/mu)$法よりもはるかにシャープである。
論文参考訳（メタデータ） (2022-05-30T14:28:02Z)
The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization [88.91190483500932]
本稿では,スムーズで強靭なミニマックス最適化問題を再考する。既存の最先端メソッドは、下限の$Omegaleft(sqrtkappa_xkappa_ylog3 (kappa_xkappa_y)logfrac1epsilonright)$にマッチしない。我々は、$mathcalOleft( sqrtkappa_xkappa_ylog)で最初のアルゴリズムを提供することで、この根本的な問題を解決する。
論文参考訳（メタデータ） (2022-05-11T17:33:07Z)
Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(\epsilon^{-7/4})$ Complexity [70.65867695317633]
本稿では,2つの単純な加速勾配法,再発進勾配降下法(AGD)と再発進球法(HB)を提案する。我々は、我々の手法が$frac1epsilon)$の勾配反復数を達成することを確証する。我々のアルゴリズムは、ネストフの古典的なAGDオークのHBと再起動機構のみからなるという意味では単純である。
論文参考訳（メタデータ） (2022-01-27T10:04:04Z)
A One-bit, Comparison-Based Gradient Estimator [29.600975900977343]
正規化勾配の頑健で信頼性の高い推定器を構築するために、1ビット圧縮センシングのツールを利用する方法を示す。勾配降下法において,この推定器を用いたSCOBOというアルゴリズムを提案する。
論文参考訳（メタデータ） (2020-10-06T05:01:38Z)
Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization [69.49313819343992]
広範に使われているEXTRA法とDIG法を拡張し,VR-EXTRA法とVR-DIGing法という2つの手法を提案する。提案されたVR-EXTRAは、$O(kappa_s+n)logfrac1epsilon)$グラデーション評価と$O(kappa_b+kappa_c)logfrac1epsilon)$通信ラウンドを必要とする。提案されているVR-DIGingは、O(kappa_b+kappa)の通信コストが少し高い
論文参考訳（メタデータ） (2020-09-09T15:48:44Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
本稿では、勾配のないミニマックス最適化問題の大きさを非強設定で表現する。本稿では,新しいゼロ階分散還元降下アルゴリズムが,クエリの複雑さを最もよく表すことを示す。
論文参考訳（メタデータ） (2020-06-16T17:55:46Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。