Fugu-MT 論文翻訳(概要): High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

論文の概要: High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

arxiv url: http://arxiv.org/abs/2310.18784v7
Date: Wed, 1 May 2024 02:30:23 GMT
ステータス: 翻訳完了
システム内更新日: 2024-05-02 20:21:02.346357
Title: High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise
Title（参考訳）: 重み付き雑音下での非線形確率勾配の高確率収束境界
Authors: Aleksandar Armacki, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, Soummya Kar,
Abstract要約: 重み付き雑音の存在下でのストリーミングデータにおける学習の精度保証について検討した。解析的に、与えられた問題に対する設定の選択に$ta$を使うことができることを実証する。
参考スコア（独自算出の注目度）: 59.25598762373543
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study high-probability convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. In the proposed scenario, the model is updated in an online fashion, as new information is observed, without storing any additional data. To combat the heavy-tailed noise, we consider a general framework of nonlinear stochastic gradient descent (SGD), providing several strong results. First, for non-convex costs and component-wise nonlinearities, we establish a convergence rate arbitrarily close to $\mathcal{O}\left(t^{-\frac{1}{4}}\right)$, whose exponent is independent of noise and problem parameters. Second, for strongly convex costs and component-wise nonlinearities, we establish a rate arbitrarily close to $\mathcal{O}\left(t^{-\frac{1}{2}}\right)$ for the weighted average of iterates, with exponent again independent of noise and problem parameters. Finally, for strongly convex costs and a broader class of nonlinearities, we establish convergence of the last iterate, with a rate $\mathcal{O}\left(t^{-\zeta} \right)$, where $\zeta \in (0,1)$ depends on problem parameters, noise and nonlinearity. As we show analytically and numerically, $\zeta$ can be used to inform the preferred choice of nonlinearity for given problem settings. Compared to state-of-the-art, who only consider clipping, require bounded noise moments of order $\eta \in (1,2]$, and establish convergence rates whose exponents go to zero as $\eta \rightarrow 1$, we provide high-probability guarantees for a much broader class of nonlinearities and symmetric density noise, with convergence rates whose exponents are bounded away from zero, even when the noise has finite first moment only. Moreover, in the case of strongly convex functions, we demonstrate analytically and numerically that clipping is not always the optimal nonlinearity, further underlining the value of our general framework.
Abstract（参考訳）: 本研究では,重み付き雑音の存在下でのストリーミングデータ学習の高確率収束保証について検討する。提案シナリオでは,新たな情報が観測されるにつれて,追加データを保持することなく,オンライン形式でモデルが更新される。重み付き雑音に対処するため,非線形確率勾配勾配(SGD)の一般的な枠組みを考察し,いくつかの強い結果を得た。まず、非凸コストと成分的非線形性に対して、指数が雑音や問題パラメータに依存しない$\mathcal{O}\left(t^{-\frac{1}{4}}\right)$に任意の収束速度を確立する。第二に、強い凸コストと成分の非線形性のために、重み付けされたイテレートの平均に対して$\mathcal{O}\left(t^{-\frac{1}{2}}\right)$に任意に近い速度を定め、指数は再びノイズと問題パラメータから独立する。最後に、強い凸コストとより広範な非線形性のために、最後の反復の収束を確立し、$\mathcal{O}\left(t^{-\zeta} \right)$で、$\zeta \in (0,1)$は問題パラメータ、ノイズ、非線形性に依存する。解析的および数値的に示すように、$\zeta$ は与えられた問題設定に対して好まれる非線形性の選択を知らせるのに使うことができる。クリッピングのみを考慮し、次数$\eta \in (1,2]$の有界雑音モーメントを必要とし、指数が0となる収束率を$\eta \rightarrow 1$とすると、より広範な非線形性クラスと対称密度ノイズに対して高い確率保証を与える。さらに, 強凸関数の場合, クリッピングが必ずしも最適非線形性であるとは限らないことを解析的, 数値的に示し, 一般の枠組みの価値をさらに強調する。

論文の概要: High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

関連論文リスト