Fugu-MT 論文翻訳(概要): Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

論文の概要: Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

arxiv url: http://arxiv.org/abs/2506.23836v1
Date: Mon, 30 Jun 2025 13:27:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-01 21:27:54.075454
Title: Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction
Title（参考訳）: 新しい下界構成による集中分散最適化の限界スケーラビリティの証明
Authors: Alexander Tyurin,
Abstract要約: 我々は、すべての労働者が同一の分布にアクセスする均質な(すなわちd.d.)場合であっても、すべての労働者が非バイアス付き境界 LDeltaepsilon2,$$$$$ のポリ対数的により良いポリ対数を求める集中型分散学習環境を考える。
参考スコア（独自算出の注目度）: 57.93371273485736
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider centralized distributed optimization in the classical federated learning setup, where $n$ workers jointly find an $\varepsilon$-stationary point of an $L$-smooth, $d$-dimensional nonconvex function $f$, having access only to unbiased stochastic gradients with variance $\sigma^2$. Each worker requires at most $h$ seconds to compute a stochastic gradient, and the communication times from the server to the workers and from the workers to the server are $\tau_{s}$ and $\tau_{w}$ seconds per coordinate, respectively. One of the main motivations for distributed optimization is to achieve scalability with respect to $n$. For instance, it is well known that the distributed version of SGD has a variance-dependent runtime term $\frac{h \sigma^2 L \Delta}{n \varepsilon^2},$ which improves with the number of workers $n,$ where $\Delta = f(x^0) - f^*,$ and $x^0 \in R^d$ is the starting point. Similarly, using unbiased sparsification compressors, it is possible to reduce both the variance-dependent runtime term and the communication runtime term. However, once we account for the communication from the server to the workers $\tau_{s}$, we prove that it becomes infeasible to design a method using unbiased random sparsification compressors that scales both the server-side communication runtime term $\tau_{s} d \frac{L \Delta}{\varepsilon}$ and the variance-dependent runtime term $\frac{h \sigma^2 L \Delta}{\varepsilon^2},$ better than poly-logarithmically in $n$, even in the homogeneous (i.i.d.) case, where all workers access the same distribution. To establish this result, we construct a new "worst-case" function and develop a new lower bound framework that reduces the analysis to the concentration of a random sum, for which we prove a concentration bound. These results reveal fundamental limitations in scaling distributed optimization, even under the homogeneous assumption.
Abstract（参考訳）: 古典的なフェデレーション学習では、$n$Workersが$\varepsilon$-stationary point of a $L$-smooth, $d$-dimensional nonconvex function $f$を共同で見つける。各ワーカーは確率勾配を計算するのに最低で$h$秒を必要とし、サーバからワーカーへの通信時間は、それぞれ$\tau_{s}$と$\tau_{w}$秒である。分散最適化の主な動機の1つは、$n$に関してスケーラビリティを達成することである。例えば、分散バージョン SGD の分散ランタイム項 $\frac{h \sigma^2 L \Delta}{n \varepsilon^2},$ は労働者数$n,$ で改善され、$\Delta = f(x^0) - f^*,$ と $x^0 \in R^d$ が始点であることが知られている。同様に、バイアスのないスパーシフィケーション圧縮機を使用することで、分散依存ランタイム項と通信ランタイム項の両方を削減することができる。しかし、サーバからワーカーへの通信を$\tau_{s}$とすると、サーバ側通信ランタイムの$\tau_{s} d \frac{L \Delta}{\varepsilon}$と分散依存ランタイムの$\frac{h \sigma^2 L \Delta}{\varepsilon^2}の両方をスケールする非バイアスランダムなスペーシ圧縮器を使った手法を設計することは不可能になる。この結果を確立するために、我々は新しい"Worst-case"関数を構築し、ランダム和の濃度に解析を還元する新しい下界フレームワークを開発し、そこで集中束を証明した。これらの結果は、均質な仮定の下でも、分散最適化のスケーリングにおける基本的な制限を明らかにしている。

論文の概要: Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

関連論文リスト