Fugu-MT 論文翻訳(概要): Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization

論文の概要: Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization

arxiv url: http://arxiv.org/abs/2009.04373v3
Date: Sat, 27 Aug 2022 12:55:53 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-20 12:24:33.597955
Title: Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization
Title（参考訳）: 強い凸分散最適化のための変動低減EXTRAとDIGingとその最適高速化
Authors: Huan Li, Zhouchen Lin, Yongchun Fang
Abstract要約: 広範に使われているEXTRA法とDIG法を拡張し,VR-EXTRA法とVR-DIGing法という2つの手法を提案する。提案されたVR-EXTRAは、$O(kappa_s+n)logfrac1epsilon)$グラデーション評価と$O(kappa_b+kappa_c)logfrac1epsilon)$通信ラウンドを必要とする。提案されているVR-DIGingは、O(kappa_b+kappa)の通信コストが少し高い
参考スコア（独自算出の注目度）: 69.49313819343992
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study stochastic decentralized optimization for the problem of training machine learning models with large-scale distributed data. We extend the widely used EXTRA and DIGing methods with variance reduction (VR), and propose two methods: VR-EXTRA and VR-DIGing. The proposed VR-EXTRA requires the time of $O((\kappa_s+n)\log\frac{1}{\epsilon})$ stochastic gradient evaluations and $O((\kappa_b+\kappa_c)\log\frac{1}{\epsilon})$ communication rounds to reach precision $\epsilon$, which are the best complexities among the non-accelerated gradient-type methods, where $\kappa_s$ and $\kappa_b$ are the stochastic condition number and batch condition number for strongly convex and smooth problems, respectively, $\kappa_c$ is the condition number of the communication network, and $n$ is the sample size on each distributed node. The proposed VR-DIGing has a little higher communication cost of $O((\kappa_b+\kappa_c^2)\log\frac{1}{\epsilon})$. Our stochastic gradient computation complexities are the same as the ones of single-machine VR methods, such as SAG, SAGA, and SVRG, and our communication complexities keep the same as those of EXTRA and DIGing, respectively. To further speed up the convergence, we also propose the accelerated VR-EXTRA and VR-DIGing with both the optimal $O((\sqrt{n\kappa_s}+n)\log\frac{1}{\epsilon})$ stochastic gradient computation complexity and $O(\sqrt{\kappa_b\kappa_c}\log\frac{1}{\epsilon})$ communication complexity. Our stochastic gradient computation complexity is also the same as the ones of single-machine accelerated VR methods, such as Katyusha, and our communication complexity keeps the same as those of accelerated full batch decentralized methods, such as MSDA.
Abstract（参考訳）: 大規模分散データを用いた機械学習モデルの学習問題に対する確率的分散最適化について検討する。広範に使われているEXTRA法とDIG法を拡張し,VR-EXTRA法とVR-DIGing法という2つの手法を提案する。 The proposed VR-EXTRA requires the time of $O((\kappa_s+n)\log\frac{1}{\epsilon})$ stochastic gradient evaluations and $O((\kappa_b+\kappa_c)\log\frac{1}{\epsilon})$ communication rounds to reach precision $\epsilon$, which are the best complexities among the non-accelerated gradient-type methods, where $\kappa_s$ and $\kappa_b$ are the stochastic condition number and batch condition number for strongly convex and smooth problems, respectively, $\kappa_c$ is the condition number of the communication network, and $n$ is the sample size on each distributed node. 提案されたVR-DIGingは通信コストが$O((\kappa_b+\kappa_c^2)\log\frac{1}{\epsilon})$よりやや高い。 SAGやSAGA,SVRGといった単一マシンのVR手法と確率勾配計算の複雑さは同一であり,通信の複雑さはEXTRAやDIGingと同じである。さらに,vr-extraとvr-digingを最適な$o((\sqrt{n\kappa_s}+n)\log\frac{1}{\epsilon})$確率的勾配計算複雑性と$o(\sqrt{\kappa_b\kappa_c}\log\frac{1}{\epsilon})$通信複雑性で高速化する。確率的勾配計算の複雑さはkatyushaのようなシングルマシン高速化vr法と同じであり、通信の複雑さはmsdaのような高速化された全バッチ分散手法と同じである。

関連論文リスト

Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach [32.36073823372713]
機械学習モデルでは、アルゴリズムはその勾配のためにデータセンターとサンプルデータに通信する必要がある。これにより、通信効率が良く、勾配計算の数を最小限に抑える分散最適化アルゴリズムの必要性が生じる。通信効率が高く,$varepsilon$-approximate のソリューションを実現する。
論文参考訳（メタデータ） (2024-04-03T06:55:59Z)
A Quadratic Synchronization Rule for Distributed Deep Learning [66.68264684667562]
本研究は、擬似同期規則(QSR)と呼ばれる$H$を決定するための理論基底法を提案する。 ResNet と ViT の実験により、QSR を用いた局所勾配法は、他の同期戦略よりもテスト精度を一貫して向上することが示された。
論文参考訳（メタデータ） (2023-10-22T21:38:57Z)
Variance-reduced accelerated methods for decentralized stochastic double-regularized nonconvex strongly-concave minimax problems [7.5573375809946395]
我々は、ピアツーピア通信により、$m$のコンピューティングエージェントのネットワークが協調すると考えている。我々のアルゴリズムフレームワークは、二変数のコンセンサス制約を取り除くために、アグラジアン乗算器を導入している。我々の知る限りでは、これはNCSCミニマックス問題に対する収束保証を、原始変数と双対変数の両方に適用する一般の非正規化器で提供する最初の研究である。
論文参考訳（メタデータ） (2023-07-14T01:32:16Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
我々はスケッチを利用した新しい分散Adam型アルゴリズムのクラス(例:SketchedAMSGrad)を提案する。我々の新しいアルゴリズムは、反復毎に$O(frac1sqrtnT + frac1(k/d)2 T)$の高速収束率を$O(k log(d))$の通信コストで達成する。
論文参考訳（メタデータ） (2022-10-14T01:42:05Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
Stochastic Gradient Methods with Compressed Communication for Decentralized Saddle Point Problems [0.0]
圧縮に基づく勾配アルゴリズムを2つ開発し,非滑らかな凸凸の強い凹凸点問題のクラスを解く。最初のアルゴリズムは、一般的な設定のための圧縮(C-RDPSG)を用いたRestartベースの分散近似勾配法である。次に,有限和設定のための圧縮 (C-DPSVRG) を用いた分散近似変数低減勾配アルゴリズムを提案する。
論文参考訳（メタデータ） (2022-05-28T15:17:19Z)
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! [6.170898159041277]
ProxSkipは、スムーズな(f$)関数と高価な非滑らかな(psi$)関数の和を最小化する驚くほどシンプルで、証明可能な効率のよい方法である。我々の主な動機は、勾配演算子の評価が、すべてのデバイスで独立して局所的なGDステップを取ることに対応する、連合学習にある。
論文参考訳（メタデータ） (2022-02-18T18:56:05Z)
Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity [30.97847679999505]
GDA(Gradient descent Ascent)は、GAN(Generative Adversarial Network)やGAN(Adversarial Network)といった実用用途で広く利用されている。最近の研究は、一方のトレーニング結果に対する反復の理論において、GDAの収束率が劣っていることを示している。本稿では, 1変数のpolyak-Lojasiewicz(PL)条件を満たすという軽微な仮定の下で, GDAと滑らかなGDAの交互化という2つの代替シングルループアルゴリズムを確立する。
論文参考訳（メタデータ） (2021-12-10T15:35:42Z)
AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization [104.96004056928474]
本稿では,非コンケーブ最小値問題に対する高速適応勾配降下法を提案する。我々は,本手法が,ミニバッチサイズが$O(kappa2.5epsilon-3)$のより低いサンプル複雑性に達することを示す。
論文参考訳（メタデータ） (2021-06-30T14:47:09Z)
Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
HSDMPGは、学習モデル上で過大なエラーの順序である$mathcalObig(1/sttnbig)$を達成可能であることを示す。損失係数について、HSDMPGは学習モデル上で過大なエラーの順序である$mathcalObig(1/sttnbig)$を達成できることを示す。
論文参考訳（メタデータ） (2020-09-18T02:18:44Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。