Fugu-MT 論文翻訳(概要): Asynchronous Training Schemes in Distributed Learning with Time Delay

論文の概要: Asynchronous Training Schemes in Distributed Learning with Time Delay

arxiv url: http://arxiv.org/abs/2208.13154v1
Date: Sun, 28 Aug 2022 07:14:59 GMT
ステータス: 翻訳完了
システム内更新日: 2022-08-30 14:15:22.153863
Title: Asynchronous Training Schemes in Distributed Learning with Time Delay
Title（参考訳）: 時間遅延を伴う分散学習における非同期学習方式
Authors: Haoxiang Wang, Zhanhong Jiang, Chao Liu, Soumik Sarkar, Dongxiang Jiang, Young M. Lee
Abstract要約: 分散ディープラーニングの文脈では、固定重みや勾配の問題によってアルゴリズムの性能が低下する可能性がある。本稿では,静的な重みや勾配の問題に対処する別のアプローチを提案する。また,PC-ASGDの実用版として,トレードオフパラメータの決定を支援する条件を適用して提案する。
参考スコア（独自算出の注目度）: 17.259708772713164
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the context of distributed deep learning, the issue of stale weights or gradients could result in poor algorithmic performance. This issue is usually tackled by delay tolerant algorithms with some mild assumptions on the objective functions and step sizes. In this paper, we propose a different approach to develop a new algorithm, called $\textbf{P}$redicting $\textbf{C}$lipping $\textbf{A}$synchronous $\textbf{S}$tochastic $\textbf{G}$radient $\textbf{D}$escent (aka, PC-ASGD). Specifically, PC-ASGD has two steps - the $\textit{predicting step}$ leverages the gradient prediction using Taylor expansion to reduce the staleness of the outdated weights while the $\textit{clipping step}$ selectively drops the outdated weights to alleviate their negative effects. A tradeoff parameter is introduced to balance the effects between these two steps. Theoretically, we present the convergence rate considering the effects of delay of the proposed algorithm with constant step size when the smooth objective functions are weakly strongly-convex and nonconvex. One practical variant of PC-ASGD is also proposed by adopting a condition to help with the determination of the tradeoff parameter. For empirical validation, we demonstrate the performance of the algorithm with two deep neural network architectures on two benchmark datasets.
Abstract（参考訳）: 分散ディープラーニングの文脈では、固定重みや勾配の問題によってアルゴリズムの性能が低下する可能性がある。この問題は通常、目的関数とステップサイズに関する軽度な仮定を持つ遅延耐性アルゴリズムによって取り組まれる。本稿では,新しいアルゴリズムを開発するために,$\textbf{P}$redicting $\textbf{C}$lipping $\textbf{A}$synchronous $\textbf{S}$tochastic $\textbf{G}$radient $\textbf{D}$escent (別名 PC-ASGD)を提案する。具体的には、PC-ASGDには2つのステップがある: $\textit{predicting step}$はTaylor拡張を使った勾配予測を利用して時代遅れの重みの安定化を減らし、$\textit{clipping step}$は時代遅れの重みを選択的に減らし、負の効果を緩和する。これらの2つのステップ間の効果のバランスをとるためにトレードオフパラメータが導入された。理論的には, 滑らかな対象関数が弱く強凸かつ非凸である場合, 提案アルゴリズムのステップサイズが一定である場合の遅延の影響を考慮した収束率を示す。また,PC-ASGDの実用版として,トレードオフパラメータの決定を支援する条件を適用して提案する。実験的な検証のために、2つのベンチマークデータセット上で2つのディープニューラルネットワークアーキテクチャを用いてアルゴリズムの性能を示す。

関連論文リスト

Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
離散部分モジュラー問題を連続的に最適化するために,$textbfMA-OSMA$アルゴリズムを提案する。また、一様分布を混合することによりKLの発散を効果的に活用する、プロジェクションフリーな$textbfMA-OSEA$アルゴリズムも導入する。我々のアルゴリズムは最先端OSGアルゴリズムによって提供される$(frac11+c)$-approximationを大幅に改善する。
論文参考訳（メタデータ） (2025-02-07T15:57:56Z)
SAPPHIRE: Preconditioned Stochastic Variance Reduction for Faster Large-Scale Statistical Learning [18.055120576191204]
Ill-conditioned objectives and nonsmooth regularizers under the performance of traditional convex method。本研究では,不条件な複合型大規模機械学習問題に対する分散自由解を提案する。
論文参考訳（メタデータ） (2025-01-27T10:36:45Z)
Non-stationary Online Convex Optimization with Arbitrary Delays [50.46856739179311]
本稿では,非定常環境における遅延オンライン凸最適化(OCO)について検討する。まず, 遅延勾配の勾配降下ステップを, 到着順に応じて行う単純なアルゴリズム, DOGDを提案する。 DOGDが達成した動的後悔境界を$O(sqrtbardT(P_T+1))$に削減する改良アルゴリズムを開発した。
論文参考訳（メタデータ） (2023-05-20T07:54:07Z)
Finite-Time Error Bounds for Greedy-GQ [20.51105692499517]
We show that Greedy-GQ algorithm converges fast-time error。我々の分析は、ステップサイズを選択するために、より高速な収束ステップサイズを提供する。
論文参考訳（メタデータ） (2022-09-06T15:04:57Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
Momentum-Based Policy Gradient with Second-Order Information [40.51117836892182]
本稿では,2次情報を勾配降下に組み込んだSHARP法を提案する。従来の研究と異なり,提案アルゴリズムでは,分散還元プロセスの利点を損なうような重要サンプリングを必要としない。提案手法が様々な制御課題に対して有効であることを示すとともに,実際の技術状況に対する優位性を示す。
論文参考訳（メタデータ） (2022-05-17T11:56:50Z)
Asynchronous Stochastic Optimization Robust to Arbitrary Delays [54.61797739710608]
遅延勾配の最適化を考えると、ステップt$毎に、アルゴリズムは古い計算を使って更新する - d_t$ for arbitrary delay $d_t gradient。本実験は,遅延分布が歪んだり重くなったりした場合のアルゴリズムの有効性とロバスト性を示す。
論文参考訳（メタデータ） (2021-06-22T15:50:45Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
問題に対処するための新しいアルゴリズム - Momentum- Single-timescale Approximation (MSTSA) を提案する。 MSTSAでは、低いレベルのサブプロブレムに対する不正確な解決策のため、反復でエラーを制御することができます。
論文参考訳（メタデータ） (2021-02-15T07:10:33Z)
Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
敵対的レジリエントな分散最適化。機械は独立して勾配を計算し協力することができます私達のアルゴリズムは新しい集中の技術およびサンプル複雑性に基づいています。それは非常に実用的です:それはないときすべての前の方法の性能を改善します。セッティングマシンがあります。
論文参考訳（メタデータ） (2020-12-28T17:19:32Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
本稿では,各ステップごとに1つのデータポイントしか必要としない2つの単一スケールシングルループアルゴリズムを提案する。本研究の結果は, 同時一次および二重側収束の形で表される。
論文参考訳（メタデータ） (2020-08-23T20:36:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。