Fugu-MT 論文翻訳(概要): Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

論文の概要: Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

arxiv url: http://arxiv.org/abs/2102.10346v1
Date: Sat, 20 Feb 2021 13:45:11 GMT
ステータス: 翻訳完了
システム内更新日: 2021-02-23 14:45:36.053492
Title: Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
Title（参考訳）: 定騒音変動下における確率勾配の収束速度
Authors: Hongjian Wang, Mert G\"urb\"uzbalaban, Lingjiong Zhu, Umut \c{S}im\c{s}ekli, Murat A. Erdogdu
Abstract要約: ヘビーテールは様々なシナリオで勾配降下 (sgd) で現れる。 SGDの収束保証は、潜在的に無限のばらつきを持つ状態依存性および重尾ノイズ下で提供します。その結果,SGDは無限に分散した重尾雑音下であっても,地球最適値に収束できることが示された。
参考スコア（独自算出の注目度）: 14.06947898164194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the $p$-th moment of the noise exists for some $p\in [1,2)$, we first identify a condition on the Hessian, coined '$p$-positive (semi-)definiteness', that leads to an interesting interpolation between positive semi-definite matrices ($p=2$) and diagonally dominant matrices with non-negative diagonal entries ($p=1$). Under this condition, we then provide a convergence rate for the distance to the global optimum in $L^p$. Furthermore, we provide a generalized central limit theorem, which shows that the properly scaled Polyak-Ruppert averaging converges weakly to a multivariate $\alpha$-stable random vector. Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum without necessitating any modification neither to the loss function or to the algorithm itself, as typically required in robust statistics. We demonstrate the implications of our results to applications such as linear regression and generalized linear models subject to heavy-tailed data.
Abstract（参考訳）: 最近の研究は、さまざまなシナリオで確率勾配降下(SGD)で重い尾が出現できることを示す経験的および理論的証拠の両方を提供してきました。このような重い尾は、ばらつきを伴う反復を引き起こす可能性があり、2階モーメントの存在に依存する従来の収束解析技術の使用を妨げる。本稿では,SGDの収束保証を,強い凸対象のクラスに対して,潜在的に無限に分散した状態依存かつ重み付きノイズの下で提供する。ある種の$p\in [1,2)$ に対して、ノイズの p$-th モーメントが存在する場合、最初に「$p$-positive (semi-)definiteness)」と呼ばれるヘッシアン上の条件を特定し、正の半定義行列 (p=2$) と非負の対角成分 (p=1$) を持つ対角支配行列の間の興味深い補間をもたらす。この条件の下で、我々は$L^p$でグローバル最適への距離の収束率を提供します。さらに,多変量 $\alpha$-stable 確率ベクトルに適度にスケールした polyak-ruppert averaging が弱収束することを示す一般化中心極限定理を提案する。この結果から,SGD は無限にばらつきのある重み付き雑音下であっても,損失関数やアルゴリズム自体の変更を必要とせず,大域的最適度に収束可能であることが示唆された。重み付きデータに基づく線形回帰や一般化線形モデルといった応用における結果の意義を実証する。

関連論文リスト

Can SGD Handle Heavy-Tailed Noise? [6.111519084375339]
Gradient Descent (SGD) は大規模最適化のための機械学習プロジェクトであるが、重尾雑音下での理論的挙動は理解されていない。このような悪条件下でSGDが確実に成功できるかどうかを精査する。
論文参考訳（メタデータ） (2025-08-06T20:09:41Z)
A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Gradient Descent(SGD)の平均イテレーションは、SWA(Weight Averaging)、EMA(Exponential moving Average)、LAWA(Latest Weight Averaging)といったディープラーニングモデルのトレーニングにおいて、経験的な成功を収めている。本稿では、LAWAを有限重み平均化(FWA)として一般化し、最適化と一般化の観点からSGDと比較して、それらの利点を説明する。
論文参考訳（メタデータ） (2024-11-20T10:08:22Z)
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees [56.80920351680438]
本研究では,重音の存在下でのオンライン学習における高確率収束について検討する。切断のみを考慮し、有界な$p$-thモーメントでノイズを必要とする最先端技術と比較して、幅広い非線形性の保証を提供する。
論文参考訳（メタデータ） (2024-10-17T18:25:28Z)
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation [22.652143194356864]
ステップサイズが一定となる勾配勾配(SGD)アルゴリズムを用いて, 強い凸と滑らかな問題を解く問題に対処する。得られた推定子の平均二乗誤差を$n$の反復数に対して拡張する。我々は、この鎖が定義された重み付きワッサーシュタイン半計量に関して幾何学的にエルゴード的であることを確証する。
論文参考訳（メタデータ） (2024-10-07T15:02:48Z)
Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods [25.831462008050387]
グラディエント・Descent(SGD)アルゴリズムは、実際の性能が良く、理論的な理解が欠如していることから、人々の関心を喚起している。有限収束がより広い合成最適化や非ユークリッドノルムに証明可能な拡張が可能かどうかはまだ不明である。
論文参考訳（メタデータ） (2023-12-13T21:41:06Z)
Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications [2.0584253077707477]
目的関数 $J(cdot)$ の定常点を求めるグラディエント・Descent (SGD) 法の収束特性について検討した。この結果は、すべての定常点が大域最小値である性質を持つ invex' 関数のクラスに適用できる。
論文参考訳（メタデータ） (2023-12-05T15:22:39Z)
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
構造密度の重み付き雑音によるクリップ最適化問題を考察する。勾配が有限の順序モーメントを持つとき、$mathcalO(K-(alpha - 1)/alpha)$よりも高速な収束率が得られることを示す。得られた推定値が無視可能なバイアスと制御可能な分散を持つことを示す。
論文参考訳（メタデータ） (2023-11-07T17:39:17Z)
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
重み付き雑音の存在下でのストリーミングデータにおける学習の精度保証について検討した。解析的に、与えられた問題に対する設定の選択に$ta$を使うことができることを実証する。
論文参考訳（メタデータ） (2023-10-28T18:53:41Z)
Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds [12.699376765058137]
勾配降下法(SGD)は、おそらく現代の機械学習において最も一般的な最適化法である。 SGDを交換せずにサンプリングするSGDが分析されたのはごく最近のことだ。データマトリックスに依存し、既存の境界によって予測されるものよりも決して悪くない、きめ細かい複雑性境界を証明します。
論文参考訳（メタデータ） (2023-06-21T18:14:44Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
勾配降下(SGD)により最適化された高次元におけるランダム特徴(RF)回帰特性について検討する。本研究では, RF回帰の高精度な非漸近誤差境界を, 定常および適応的なステップサイズSGD設定の下で導出する。理論的にも経験的にも二重降下現象を観察する。
論文参考訳（メタデータ） (2021-10-13T17:47:39Z)
Last iterate convergence of SGD for Least-Squares in the Interpolation regime [19.05750582096579]
基本最小二乗構成におけるノイズレスモデルについて検討する。最適予測器が完全に入力に適合すると仮定し、$langletheta_*, phi(X) rangle = Y$, ここで$phi(X)$は無限次元の非線型特徴写像を表す。
論文参考訳（メタデータ） (2021-02-05T14:02:20Z)
Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model [0.0]
このモデルに基づく最小二乗リスクに対する1パス, 固定段差勾配勾配の収束度を解析した。特殊な場合として、ランダムなサンプリング点における値のノイズのない観測から単位区間上の実関数を推定するオンラインアルゴリズムを解析する。
論文参考訳（メタデータ） (2020-06-15T08:25:50Z)
On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration [115.1954841020189]
The inequality and non-asymptotic properties of approximation procedure with Polyak-Ruppert averaging。一定のステップサイズと無限大となる反復数を持つ平均的反復数に対する中心極限定理(CLT)を証明する。
論文参考訳（メタデータ） (2020-04-09T17:54:18Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。