Fugu-MT 論文翻訳(概要): High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

論文の概要: High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

arxiv url: http://arxiv.org/abs/2605.00281v1
Date: Thu, 30 Apr 2026 22:45:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 17:43:28.783963
Title: High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking
Title（参考訳）: 勾配追従を用いた分散確率最適化における高確率収束
Authors: Aleksandar Armacki, Haoyuan Cai, Ali H. Sayed,
Abstract要約: 分散最適化における高確率収束保証について検討する。その結果, 地平線上の条件は, 比較時間と同一であることがわかった。
参考スコア（独自算出の注目度）: 69.90407799170687
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study high-probability (HP) convergence guarantees in decentralized stochastic optimization, where multiple agents collaborate to jointly train a model over a network. Existing HP results in decentralized settings almost exclusively focus on the Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm, which requires strong assumptions, such as bounded data heterogeneity, or strong convexity of each agent's cost. This is contrary to the mean-squared error (MSE) results, where methods incorporating bias-correction techniques are known to converge under relaxed assumptions and achieve better practical performance. In this paper we provide the first step toward bridging the gap, by studying HP convergence of $\mathtt{DSGD}$ incorporating the gradient tracking technique, in the presence of noise satisfying a relaxed sub-Gaussian condition. We show that the resulting method, dubbed $\mathtt{GT-DSGD}$, achieves order-optimal HP convergence rates for both non-convex and Polyak-Łojasiewicz costs, of order $\mathcal{O}\Big(\frac{\log(1/δ)}{\sqrt{nT}}\Big)$ and $\mathcal{O}\Big(\frac{\log(1/δ)}{nT}\Big)$, respectively, where $n$ is the number of agents, $T$ is the time horizon and $δ\in (0,1)$ is the confidence parameter. Our results establish that $\mathtt{GT-DSGD}$ converges in the HP sense under the same conditions on the cost as in the MSE sense, while achieving comparable transient times. To the best of our knowledge, these are the first HP guarantees for decentralized optimization methods incorporating bias-correction. Numerical experiments on real and synthetic data verify our theoretical findings, underlining the superior performance of $\mathtt{GT-DSGD}$ and highlighting that the benefits of incorporating bias-correction are also maintained in the HP sense.
Abstract（参考訳）: 本稿では,複数のエージェントが協調してネットワーク上でモデルをトレーニングする分散確率最適化における高確率収束保証について検討する。既存のHPは、分散化確率勾配(Decentralized Stochastic Gradient Descent)(\mathtt{DSGD}$)アルゴリズムにのみ焦点をあてており、これは有界データの不均一性や各エージェントのコストの強い凸性といった強い仮定を必要とする。これは平均二乗誤差(MSE)の結果とは逆で、バイアス補正手法を取り入れた手法は緩和された仮定の下で収束し、より実用的な性能を達成することが知られている。本稿では,緩やかな準ガウス条件を満たす雑音の存在下で,勾配追跡手法を取り入れたHP収束について検討し,ギャップを埋める第一歩を示す。結果の方法である$\matht{GT-DSGD}$は,非凸およびポリアック・ジョジャシエヴィチ費用の順序最適HP収束率,および$\mathcal{O}\Big(\frac{\log(1/δ)}{\sqrt{nT}}\Big)$と$\mathcal{O}\Big(\frac{\log(1/δ)}{nT}\Big)$をそれぞれ達成し,$n$はエージェント数,$T$は時間軸,$δ\in(0,1)$は信頼性パラメータであることを示す。この結果から,$\mathtt{GT-DSGD}$ は MSE の値と同じ条件下で HP の値に収束することがわかった。我々の知る限り、これらはバイアス補正を取り入れた分散最適化手法に対する最初のHP保証である。実データおよび合成データに関する数値実験により、この理論的な知見を検証し、$\mathtt{GT-DSGD}$の優れた性能を概説し、バイアス補正を組み込むことの利点もHPの意味で維持されていることを強調した。

論文の概要: High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

関連論文リスト