Fugu-MT 論文翻訳(概要): Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework

論文の概要: Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework

arxiv url: http://arxiv.org/abs/2406.11159v1
Date: Mon, 17 Jun 2024 02:56:55 GMT
ステータス: 翻訳完了
システム内更新日: 2024-06-18 18:43:55.306902
Title: Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework
Title（参考訳）: 定常な分散確率勾配Descent:確率遅延微分方程式に基づくフレームワーク
Authors: Siyuan Yu, Wei Chen, H. Vincent Poor,
Abstract要約: 分散勾配降下(SGD)は、計算リソースのスケーリング、トレーニング時間の短縮、マシンラーニングにおけるユーザのプライバシ保護の支援などにより、近年注目されている。本稿では,遅延微分方程式(SDDE)と勾配到着の近似に基づく分散SGDの実行時間と安定化について述べる。活性化作業員の増加は, 安定度による分散SGDを必ずしも加速させるものではないことが興味深い。
参考スコア（独自算出の注目度）: 56.82432591933544
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distributed stochastic gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and helping protect user privacy in machine learning. However, the staggers and limited bandwidth may induce random computational/communication delays, thereby severely hindering the learning process. Therefore, how to accelerate asynchronous SGD by efficiently scheduling multiple workers is an important issue. In this paper, a unified framework is presented to analyze and optimize the convergence of asynchronous SGD based on stochastic delay differential equations (SDDEs) and the Poisson approximation of aggregated gradient arrivals. In particular, we present the run time and staleness of distributed SGD without a memorylessness assumption on the computation times. Given the learning rate, we reveal the relevant SDDE's damping coefficient and its delay statistics, as functions of the number of activated clients, staleness threshold, the eigenvalues of the Hessian matrix of the objective function, and the overall computational/communication delay. The formulated SDDE allows us to present both the distributed SGD's convergence condition and speed by calculating its characteristic roots, thereby optimizing the scheduling policies for asynchronous/event-triggered SGD. It is interestingly shown that increasing the number of activated workers does not necessarily accelerate distributed SGD due to staleness. Moreover, a small degree of staleness does not necessarily slow down the convergence, while a large degree of staleness will result in the divergence of distributed SGD. Numerical results demonstrate the potential of our SDDE framework, even in complex learning tasks with non-convex objective functions.
Abstract（参考訳）: 分散確率勾配勾配(SGD)は、計算リソースのスケーリング、トレーニング時間の短縮、マシンラーニングにおけるユーザのプライバシ保護の支援などにより、近年注目されている。しかし、スタガーと帯域幅の制限はランダムな計算/通信遅延を引き起こす可能性があるため、学習プロセスが著しく妨げられる。したがって、複数のワーカーを効率的にスケジューリングすることで非同期SGDをいかに加速するかが重要な問題である。本稿では,確率的遅延微分方程式(SDDE)とポアソン近似に基づく非同期SGDの収束解析と最適化を行う。特に,分散SGDの実行時間と安定度を,計算時間に対するメモリレスの仮定なしで提示する。学習率から, SDDEの減衰係数とその遅延統計値, アクティベートクライアント数, 安定化しきい値, 目的関数のヘッセン行列の固有値, 全体的な計算/通信遅延などを明らかにする。定式化されたSDDEにより,分散SGDの収束条件と特性根の計算による速度の両立が可能となり,非同期/イベントトリガーSGDのスケジューリングポリシが最適化される。活性化作業員の増加は, 安定度による分散SGDを必ずしも加速させるものではないことが興味深い。さらに、小さな安定度は必ずしも収束を遅くするわけではないが、大きな安定度は分散SGDのばらつきをもたらす。非凸目的関数を持つ複雑な学習タスクにおいても,SDDEフレームワークの可能性を示す。

論文の概要: Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework

関連論文リスト