Fugu-MT 論文翻訳(概要): (Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum

論文の概要: (Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum

arxiv url: http://arxiv.org/abs/2401.06738v2
Date: Mon, 10 Jun 2024 22:16:01 GMT
ステータス: 翻訳完了
システム内更新日: 2024-06-12 22:42:29.206695
Title: (Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum
Title（参考訳）: (加速)雑音適応型確率重ボールモーメント
Authors: Anh Dang, Reza Babanezhad, Sharan Vaswani,
Abstract要約: ヘビーボール運動量(SHB)は、機械学習モデルのトレーニングに一般的に用いられ、勾配降下の反復よりも経験的な改善を提供することが多い。 SHB は小サイズが $kappa の閾値 $b* よりも大きい場合に加速できることを示す。
参考スコア（独自算出の注目度）: 7.095058159492494
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic heavy ball momentum (SHB) is commonly used to train machine learning models, and often provides empirical improvements over stochastic gradient descent. By primarily focusing on strongly-convex quadratics, we aim to better understand the theoretical advantage of SHB and subsequently improve the method. For strongly-convex quadratics, Kidambi et al. (2018) show that SHB (with a mini-batch of size $1$) cannot attain accelerated convergence, and hence has no theoretical benefit over SGD. They conjecture that the practical gain of SHB is a by-product of using larger mini-batches. We first substantiate this claim by showing that SHB can attain an accelerated rate when the mini-batch size is larger than a threshold $b^*$ that depends on the condition number $\kappa$. Specifically, we prove that with the same step-size and momentum parameters as in the deterministic setting, SHB with a sufficiently large mini-batch size results in an $O\left(\exp(-\frac{T}{\sqrt{\kappa}}) + \sigma \right)$ convergence, where $T$ is the number of iterations and $\sigma^2$ is the variance in the stochastic gradients. We prove a lower-bound which demonstrates that a $\kappa$ dependence in $b^*$ is necessary. To ensure convergence to the minimizer, we design a noise-adaptive multi-stage algorithm that results in an $O\left(\exp\left(-\frac{T}{\sqrt{\kappa}}\right) + \frac{\sigma}{T}\right)$ rate. We also consider the general smooth, strongly-convex setting and propose the first noise-adaptive SHB variant that converges to the minimizer at an $O(\exp(-\frac{T}{\kappa}) + \frac{\sigma^2}{T})$ rate. We empirically demonstrate the effectiveness of the proposed algorithms.
Abstract（参考訳）: 確率重球運動量(SHB)は、機械学習モデルのトレーニングに一般的に用いられ、確率勾配よりも経験的な改善を提供することが多い。強凸二次論に主に焦点をあてることで、SHBの理論的優位性をよりよく理解し、その方法を改善することを目指している。強い凸二次数について、Kidambi et al (2018) はSHB (ミニバッチが 1 ドル) は加速収束を達成できず、従ってSGD よりも理論的に有利であることを示した。彼らはSHBの実用的利益はより大きなミニバッチを使用する副産物であると推測した。まず、SHBが条件数$\kappa$に依存するしきい値$b^*$より大きい場合、SHBが加速できることを示す。具体的には、決定論的設定と同じステップサイズと運動量パラメータで、十分に大きなミニバッチサイズを持つSHBは、$O\left(\exp(-\frac{T}{\sqrt{\kappa}}) + \sigma \right)$ convergence, where $T$ is the number of iterations and $\sigma^2$ is the variance in the stochastic gradients。我々は、$\kappa$が$b^*$に依存する必要があることを示す下界を証明する。最小化器への収束を確保するために、我々は$O\left(-\frac{T}{\sqrt{\kappa}}\right) + \frac{\sigma}{T}\right)$ rateとなる雑音適応型多段アルゴリズムを設計する。また、一般的な滑らかで強凸な設定を考慮し、$O(\exp(-\frac{T}{\kappa}) + \frac{\sigma^2}{T})$レートで最小値に収束する最初のノイズ適応型SHB変種を提案する。提案アルゴリズムの有効性を実証的に示す。

関連論文リスト

Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD [0.0]
我々は,ミニバッチ勾配降下(SGD)列車が異なる体制で「エッジ・オブ・安定性(EoSS)」と呼ばれることを示す。 2/eta$で安定化されるのは *Batch Sharpness* である。さらに,SGD軌道の数学的モデリングについて考察する。
論文参考訳（メタデータ） (2024-12-29T18:59:01Z)
Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD [16.019880089338383]
Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsff
論文参考訳（メタデータ） (2024-10-26T10:14:17Z)
Second-order Information Promotes Mini-Batch Robustness in Variance-Reduced Gradients [0.196629787330046]
目的関数の部分的な2次情報を組み込むことで、分散還元勾配法のミニバッチサイズに対するロバスト性を劇的に向上させることができることを示す。本稿では,この現象をプロトタイプNewton(textttMb-SVRN$)アルゴリズムで示す。
論文参考訳（メタデータ） (2024-04-23T05:45:52Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
相補的な合成条件に基づく2つの難解なミラー降下アルゴリズムを導入する。注目すべきは、どちらのアルゴリズムも、目的関数のリプシッツ定数や滑らかさに関する事前の知識なしで機能する。本稿では,大規模半確定プログラム上での手法の効率性とロバスト性を示す。
論文参考訳（メタデータ） (2023-06-30T08:34:29Z)
Contextual Combinatorial Bandits with Probabilistically Triggered Arms [55.9237004478033]
確率的に誘発される腕(C$2$MAB-T)を様々な滑らかさ条件下で検討した。トリガー変調 (TPM) 条件の下では、C$2$-UC-Tアルゴリズムを考案し、後悔すべき$tildeO(dsqrtT)$を導出する。
論文参考訳（メタデータ） (2023-03-30T02:51:00Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
A Variance-Reduced Stochastic Accelerated Primal Dual Algorithm [3.2958527541557525]
このような問題は、堅牢な経験的リスク最小化という文脈で機械学習で頻繁に発生する。高速化された原始双対 (SAPD) アルゴリズムは勾配雑音に対する頑健な手法であると考えている。提案手法は,SAPDの実践と理論の両方において改善されていることを示す。
論文参考訳（メタデータ） (2022-02-19T22:12:30Z)
Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent [7.176107039687231]
雑音に対して勾配降下(SGD)を適応させるステップサイズスキームを設計する。我々は、Nesterov反復によるSGDの$T$反復がほぼ最適であることを示す。他のステップサイズスキームと比較して、新しい指数的なステップサイズスキームの有効性を実証する。
論文参考訳（メタデータ） (2021-10-21T19:22:14Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
漸進的強化学習における後悔を最小限に抑えるために,新しいモデルフリーアルゴリズムを提案する。提案アルゴリズムは、2つのQ-ラーニングシーケンスの助けを借りて、初期設定された参照更新ルールを用いる。初期の分散還元法の設計原理は、他のRL設定とは独立した関心を持つかもしれない。
論文参考訳（メタデータ） (2021-10-09T21:13:48Z)
Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball [17.33867778750777]
一般近似問題に対する勾配降下法(SGD)と重球法(SHB)について検討した。 SGD の場合、凸と滑らかな設定において、イテレートの重み付き平均に対して、最初の最も確実な収束式を提供する。
論文参考訳（メタデータ） (2020-06-14T11:12:05Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
非同期Q-ラーニングはマルコフ決定過程(MDP)の最適行動値関数(またはQ-関数)を学習することを目的としている。 Q-関数の入出力$varepsilon$-正確な推定に必要なサンプルの数は、少なくとも$frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$の順である。
論文参考訳（メタデータ） (2020-06-04T17:51:00Z)
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness [151.67113334248464]
我々は、他の攻撃モデルに対してスムースな手法を拡張することは困難であることを示す。我々はCIFARに関する実験結果を示し,その理論を検証した。
論文参考訳（メタデータ） (2020-02-08T22:02:14Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。