Fugu-MT 論文翻訳(概要): Variance-reduced Clipping for Non-convex Optimization

論文の概要: Variance-reduced Clipping for Non-convex Optimization

arxiv url: http://arxiv.org/abs/2303.00883v2
Date: Fri, 2 Jun 2023 23:35:16 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-07 02:36:40.198450
Title: Variance-reduced Clipping for Non-convex Optimization
Title（参考訳）: 非凸最適化のためのばらつき低減クリッピング
Authors: Amirhossein Reisizadeh, Haochuan Li, Subhro Das, Ali Jadbabaie
Abstract要約: グラディエント・クリッピング(Gradient clipping)は、大規模言語モデリングのようなディープラーニングアプリケーションで用いられる技法である。最近の実験的な訓練は、秩序の複雑さを緩和する、非常に特別な振る舞いを持っている。
参考スコア（独自算出の注目度）: 24.765794811146144
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Gradient clipping is a standard training technique used in deep learning applications such as large-scale language modeling to mitigate exploding gradients. Recent experimental studies have demonstrated a fairly special behavior in the smoothness of the training objective along its trajectory when trained with gradient clipping. That is, the smoothness grows with the gradient norm. This is in clear contrast to the well-established assumption in folklore non-convex optimization, a.k.a. $L$--smoothness, where the smoothness is assumed to be bounded by a constant $L$ globally. The recently introduced $(L_0,L_1)$--smoothness is a more relaxed notion that captures such behavior in non-convex optimization. In particular, it has been shown that under this relaxed smoothness assumption, SGD with clipping requires $O(\epsilon^{-4})$ stochastic gradient computations to find an $\epsilon$--stationary solution. In this paper, we employ a variance reduction technique, namely SPIDER, and demonstrate that for a carefully designed learning rate, this complexity is improved to $O(\epsilon^{-3})$ which is order-optimal. Our designed learning rate comprises the clipping technique to mitigate the growing smoothness. Moreover, when the objective function is the average of $n$ components, we improve the existing $O(n\epsilon^{-2})$ bound on the stochastic gradient complexity to $O(\sqrt{n} \epsilon^{-2} + n)$, which is order-optimal as well. In addition to being theoretically optimal, SPIDER with our designed parameters demonstrates comparable empirical performance against variance-reduced methods such as SVRG and SARAH in several vision tasks.
Abstract（参考訳）: 勾配クリッピング(gradient clipping)は、大規模な言語モデリングなどのディープラーニングアプリケーションで使用される標準的なトレーニングテクニックである。最近の実験的研究は、勾配クリッピングの訓練において、軌道に沿ったトレーニング対象の滑らかさにかなり特別な挙動を示す。すなわち、滑らかさは勾配ノルムとともに成長する。これは、フォークロア非凸最適化における確立された仮定とは対照的であり、すなわち、滑らかさはグローバルに一定の$l$で境界づけられていると仮定される。最近導入された$(l_0,l_1)$-smoothnessは、非凸最適化においてそのような振る舞いをキャプチャするより緩和された概念である。特に、この緩和された滑らか性仮定の下で、クリッピングを伴うSGDは$O(\epsilon^{-4})$確率勾配計算を必要とし、$\epsilon$-定常解を求めることが示されている。本稿では,SPIDERという分散還元手法を用いて,慎重に設計された学習率に対して,この複雑さをオーダー最適の$O(\epsilon^{-3})$に改善することを示す。我々の設計した学習速度は、成長する滑らかさを緩和するクリッピング技術からなる。さらに、目的関数が$n$成分の平均であるとき、確率勾配の複雑さに縛られる$O(n\epsilon^{-2})$を$O(\sqrt{n} \epsilon^{-2} + n)$に改善する。設計したパラメータを持つSPIDERは、理論的に最適であるだけでなく、複数の視覚タスクにおいて、SVRGやSARAHのような分散推論手法と同等の性能を示す。

関連論文リスト

Gradient-free stochastic optimization for additive models [56.42455605591779]
本稿では,Polyak-Lojasiewicz あるいは強凸条件を満たす目的関数に対する雑音観測によるゼロ次最適化の問題に対処する。対象関数は加法的構造を持ち、H"古い関数族によって特徴づけられる高次滑らか性特性を満たすと仮定する。
論文参考訳（メタデータ） (2025-03-03T23:39:08Z)
A stochastic first-order method with multi-extrapolated momentum for highly smooth unconstrained optimization [3.8919212824749296]
提案したSFOMは,目的関数の高次滑らか度を$f$とすることで,最適化を高速化できることを示す。我々の知る限りでは、これは対象関数の任意の次スムーズネスを加速度に利用した最初のSFOMである。
論文参考訳（メタデータ） (2024-12-19T03:22:47Z)
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
我々は、(強に)凸 $(L0)$-smooth 関数のクラスに焦点を当て、いくつかの既存のメソッドに対する新しい収束保証を導出する。特に,スムーズなグラディエント・クリッピングを有するグラディエント・ディフレッシュと,ポリアク・ステップサイズを有するグラディエント・ディフレッシュのコンバージェンス・レートの改善を導出した。
論文参考訳（メタデータ） (2024-09-23T13:11:37Z)
Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates [15.27596975662702]
重み付き勾配を用いたDP最適化の最適速度を達成するアルゴリズムについて検討する。その結果,DP下での凸最適化の理論的限界が達成可能であることを示す。
論文参考訳（メタデータ） (2024-08-19T11:07:05Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
本稿では,2段階の適応性を持つオンライン凸最適化手法を提案する。我々は$mathcalO(log V_T)$, $mathcalO(d log V_T)$, $hatmathcalO(sqrtV_T)$ regret bounds for strong convex, exp-concave and convex loss function。
論文参考訳（メタデータ） (2023-07-17T09:55:35Z)
On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
機械学習とネットワーク最適化では、ミスの数と優れたキャッシュを最小化するため、シャッフルSGDのようなアルゴリズムが人気である。本稿では任意のデータ順序付けによる収束特性SGDアルゴリズムについて述べる。
論文参考訳（メタデータ） (2023-05-30T17:47:27Z)
Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion [56.92236659731376]
本稿では,新しい解析手法を用いて,未知の非平滑な目的を最適化するアルゴリズムを提案する。決定論的二階スムーズな目的のために、先進的な楽観的なオンライン学習技術を適用することで、新しい$O(delta0.5)All$が最適または最もよく知られた結果の回復を可能にする。
論文参考訳（メタデータ） (2023-02-07T22:09:20Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
勾配ランゲヴィン・ダイナミクスは非エプス最適化問題を解くための最も基本的なアルゴリズムの1つである。本稿では、このタイプの2つの変種、すなわち、分散還元ランジュバンダイナミクスと再帰勾配ランジュバンダイナミクスを示す。
論文参考訳（メタデータ） (2022-03-30T11:39:00Z)
Stochastic Bias-Reduced Gradient Methods [44.35885731095432]
モロー・吉田関数の任意の有界な$x_star$の低バイアスで低コストな平滑化である。
論文参考訳（メタデータ） (2021-06-17T13:33:05Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
適応アルゴリズムは勾配の歴史を用いて勾配を更新し、深層ニューラルネットワークのトレーニングにおいてユビキタスである。本稿では,非コンケーブ最小値問題に対するOptimisticOAアルゴリズムの変種を解析する。実験の結果,適応型GAN非適応勾配アルゴリズムは経験的に観測可能であることがわかった。
論文参考訳（メタデータ） (2019-12-26T22:10:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。