Fugu-MT 論文翻訳(概要): Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD

論文の概要: Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD

arxiv url: http://arxiv.org/abs/2105.08023v1
Date: Mon, 17 May 2021 17:16:52 GMT
ステータス: 翻訳完了
システム内更新日: 2021-05-18 14:18:55.598947
Title: Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD
Title（参考訳）: 分散SGDのネットワークトポロジ依存性を高めるデータ不均一性の影響の除去
Authors: Kun Yuan and Sulaiman A. Alghunaim
Abstract要約: D$2$/Exact-diffusionアルゴリズムの非同相収束特性について検討する。既存の分散アルゴリズムと比較して、D$2$/Exact-diffusionはネットワークトポロジに最も敏感です。
参考スコア（独自算出の注目度）: 15.112499553818953
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider decentralized stochastic optimization problems where a network of agents each owns a local cost function cooperate to find a minimizer of the global-averaged cost. A widely studied decentralized algorithm for this problem is D-SGD in which each node applies a stochastic gradient descent step, then averages its estimate with its neighbors. D-SGD is attractive due to its efficient single-iteration communication and can achieve linear speedup in convergence (in terms of the network size). However, D-SGD is very sensitive to the network topology. For smooth objective functions, the transient stage (which measures how fast the algorithm can reach the linear speedup stage) of D-SGD is on the order of $O(n/(1-\beta)^2)$ and $O(n^3/(1-\beta)^4)$ for strongly convex and generally convex cost functions, respectively, where $1-\beta \in (0,1)$ is a topology-dependent quantity that approaches $0$ for a large and sparse network. Hence, D-SGD suffers from slow convergence for large and sparse networks. In this work, we study the non-asymptotic convergence property of the D$^2$/Exact-diffusion algorithm. By eliminating the influence of data heterogeneity between nodes, D$^2$/Exact-diffusion is shown to have an enhanced transient stage that are on the order of $O(n/(1-\beta))$ and $O(n^3/(1-\beta)^2)$ for strongly convex and generally convex cost functions, respectively. Moreover, we provide a lower bound of the transient stage of D-SGD under homogeneous data distributions, which coincides with the transient stage of D$^2$/Exact-diffusion in the strongly-convex setting. These results show that removing the influence of data heterogeneity can ameliorate the network topology dependence of D-SGD. Compared with existing decentralized algorithms bounds, D$^2$/Exact-diffusion is least sensitive to network topology.
Abstract（参考訳）: 我々は,局所コスト関数を持つエージェントのネットワークが協調して,グローバル平均コストの最小化を求める分散確率最適化問題を考える。この問題に対して広く研究されている分散化アルゴリズムはD-SGDであり、各ノードは確率勾配降下ステップを適用し、その推定値をその近傍で平均化する。 D-SGDは、効率的なシングルイテレーション通信により魅力的であり、(ネットワークサイズの観点から)収束の線形高速化を実現することができる。しかし、D-SGDはネットワークトポロジーに非常に敏感である。滑らかな目的関数に対しては、d-sgd の過渡ステージ(アルゴリズムが線形速度ステージに到達する速度を測定する)は、強凸および一般に凸コスト関数に対して $o(n/(1-\beta)^2)$ と $o(n^3/(1-\beta)^4)$ の順であり、それぞれ 1-\beta \in (0,1)$ は、大きなネットワークに対して$0$ に近づく位相依存量である。したがって、D-SGDは、大小のネットワークの収束が遅い。本研究では,D$^2$/Exact-diffusionアルゴリズムの非漸近収束特性について検討する。ノード間のデータ不均一性の影響を排除することにより、D$^2$/Exact-diffusionは、それぞれ強い凸関数と一般に凸コスト関数に対して$O(n/(1-\beta))$と$O(n^3/(1-\beta)^2)$の順に拡張された過渡段階を持つことが示される。さらに,D-SGDの過渡期を,D$^2$/Exact-diffusionの強凸条件における過渡期と一致する均一なデータ分布の下で下界を与える。これらの結果は,D-SGDのネットワークトポロジ依存性を改善できることを示す。既存の分散アルゴリズムと比べ、D$^2$/Exact-diffusionはネットワークトポロジーに最も敏感である。

関連論文リスト

Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
離散部分モジュラー問題を連続的に最適化するために,$textbfMA-OSMA$アルゴリズムを提案する。また、一様分布を混合することによりKLの発散を効果的に活用する、プロジェクションフリーな$textbfMA-OSEA$アルゴリズムも導入する。我々のアルゴリズムは最先端OSGアルゴリズムによって提供される$(frac11+c)$-approximationを大幅に改善する。
論文参考訳（メタデータ） (2025-02-07T15:57:56Z)
The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity [15.394956794959615]
DGT(Decentralized Gradient Tracking)とDGD(Decentralized Gradient Descent)の2つの基本的な分散最適化手法を再検討する。ローカルDGTが通信複雑性を$tildemathcalO Big(fracLmu(K+1) + fracdelta + mumu (1 - rho) + fracrho (1 - rho)2 cdot fracL+ deltamuBig)$, %zhizeを達成することを証明した。
論文参考訳（メタデータ） (2024-03-23T00:01:34Z)
Convergence Analysis of Decentralized ASGD [1.8710230264817358]
本稿では,ノード間の部分同期や制限的ネットワークトポロジを必要としない分散非同期SGD(DASGD)に対する新しい収束速度解析法を提案する。我々の収束証明は、固定段数と任意の非滑らかで同質でL字型の目的函数を仮定する。
論文参考訳（メタデータ） (2023-09-07T14:50:31Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
我々はスケッチを利用した新しい分散Adam型アルゴリズムのクラス(例:SketchedAMSGrad)を提案する。我々の新しいアルゴリズムは、反復毎に$O(frac1sqrtnT + frac1(k/d)2 T)$の高速収束率を$O(k log(d))$の通信コストで達成する。
論文参考訳（メタデータ） (2022-10-14T01:42:05Z)
DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs [54.08445874064361]
本稿では,分散環境下での正規化された分散ロバストな学習問題を解くことを提案する。 Kullback-Liebler正規化関数をロバストなmin-max最適化問題に追加することにより、学習問題を修正されたロバストな問題に還元することができる。提案アルゴリズムは, 最低分布検定精度を最大10%向上できることを示す。
論文参考訳（メタデータ） (2022-08-29T18:01:42Z)
Topology-aware Generalization of Decentralized SGD [89.25765221779288]
本稿では,分散型Valpha-10安定降下(D-SGD)の一般化可能性について検討する。 D-SGDの一般化性は、初期訓練段階における接続性と正の相関があることを証明した。
論文参考訳（メタデータ） (2022-06-25T16:03:48Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees [20.701475313495884]
エージェントネットワーク上の疎線形回帰を非指向グラフとしてモデル化し,サーバノードを持たない。分布予測勾配追跡に基づくアルゴリズムの収束率と統計的保証を解析する。
論文参考訳（メタデータ） (2022-01-21T01:26:08Z)
Distributed Random Reshuffling over Networks [7.013052033764372]
凸関数と滑らかな対象関数の問題を解くために分散resh-upr (D-RR) アルゴリズムを提案する。特に、滑らかな凸対象関数に対して、D-RRはD-T収束率(T がエポック数を数える)を大域ドライブ間の距離で達成する。
論文参考訳（メタデータ） (2021-12-31T03:59:37Z)
Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
集中ノードを持たないエージェントネットワーク上での分散(強い凸)最適化問題について検討する。 $varepsilon$-solutionは$tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$通信ステップ数で達成される。この速度は、関心のクラスに適用される分散ゴシップ-アルゴリズムの、初めて(ポリログ因子まで)より低い複雑性の通信境界と一致する。
論文参考訳（メタデータ） (2021-10-24T04:03:00Z)
Distributed Saddle-Point Problems Under Similarity [173.19083235638104]
与えられたサブ最適度$epsilon0$は、$Omegabigのマスター/ワーカーネットワークで達成されることを示す。次に,ネットワークの下位の型(ログオーバまで)に適合するアルゴリズムを提案する。頑健なロジスティック回帰問題に対して提案アルゴリズムの有効性を評価する。
論文参考訳（メタデータ） (2021-07-22T14:25:16Z)
Fast decentralized non-convex finite-sum optimization with recursive variance reduction [19.540926205375857]
本稿では,SARAH型分散低減技術を用いたGT-SARAHと呼ばれる一階勾配法について述べる。特に、$n = O(Nfrac12(lambda)3)$のようなビッグデータでは、ネットワークの複雑さとは無関係に、この複雑さは$O(Nfrac12Lepsilon-2)$に減少する。さらに、局所的なミニバッチサイズの適切な選択は、勾配複雑性と通信複雑性のトレードオフをバランスさせる。
論文参考訳（メタデータ） (2020-08-17T15:51:32Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。