Fugu-MT 論文翻訳(概要): Formal guarantees for heuristic optimization algorithms used in machine learning

論文の概要: Formal guarantees for heuristic optimization algorithms used in machine learning

arxiv url: http://arxiv.org/abs/2208.00502v1
Date: Sun, 31 Jul 2022 19:41:22 GMT
ステータス: 翻訳完了
システム内更新日: 2022-08-02 13:05:34.174546
Title: Formal guarantees for heuristic optimization algorithms used in machine learning
Title（参考訳）: 機械学習におけるヒューリスティック最適化アルゴリズムの形式保証
Authors: Xiaoyu Li
Abstract要約: グラディエント・Descent(SGD)とその変種は、大規模最適化機械学習(ML)問題において支配的な手法となっている。本稿では,いくつかの凸最適化手法の形式的保証と改良アルゴリズムの提案を行う。
参考スコア（独自算出の注目度）: 6.978625807687497
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Recently, Stochastic Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization of machine learning (ML) problems. A variety of strategies have been proposed for tuning the step sizes, ranging from adaptive step sizes to heuristic methods to change the step size in each iteration. Also, momentum has been widely employed in ML tasks to accelerate the training process. Yet, there is a gap in our theoretical understanding of them. In this work, we start to close this gap by providing formal guarantees to a few heuristic optimization methods and proposing improved algorithms. First, we analyze a generalized version of the AdaGrad (Delayed AdaGrad) step sizes in both convex and non-convex settings, showing that these step sizes allow the algorithms to automatically adapt to the level of noise of the stochastic gradients. We show for the first time sufficient conditions for Delayed AdaGrad to achieve almost sure convergence of the gradients to zero. Moreover, we present a high probability analysis for Delayed AdaGrad and its momentum variant in the non-convex setting. Second, we analyze SGD with exponential and cosine step sizes, which are empirically successful but lack theoretical support. We provide the very first convergence guarantees for them in the smooth and non-convex setting, with and without the Polyak-{\L}ojasiewicz (PL) condition. We also show their good property of adaptivity to noise under the PL condition. Third, we study the last iterate of momentum methods. We prove the first lower bound in the convex setting for the last iterate of SGD with constant momentum. Moreover, we investigate a class of Follow-The-Regularized-Leader-based momentum algorithms with increasing momentum and shrinking updates. We show that their last iterate has optimal convergence for unconstrained convex stochastic optimization problems.
Abstract（参考訳）: 近年、SGD(Stochastic Gradient Descent)とその変種は機械学習(ML)問題の大規模最適化において主要な手法となっている。適応的なステップサイズから、各イテレーションのステップサイズを変更するヒューリスティックメソッドまで、ステップサイズを調整するためのさまざまな戦略が提案されている。また、トレーニングプロセスの高速化のため、MLタスクにおいて運動量も広く採用されている。しかし、我々の理論的理解にはギャップがある。本研究では,いくつかのヒューリスティック最適化手法の形式的保証と改良アルゴリズムの提案により,このギャップを埋める。まず, adagrad (delayed adagrad) ステップサイズの一般化版を凸と非凸の両方の設定で解析し, これらのステップサイズにより, アルゴリズムが確率勾配の雑音レベルに自動的に適応できることを示した。遅延アダグラードが勾配をほぼ確実にゼロに収束させるのに十分な条件を初めて示す。さらに,非凸設定における遅延アダグラードとその運動量変化に対する高い確率解析を行う。第2に, sgdを指数関数的およびコサイン的ステップサイズで分析し, 実験的に成功したが, 理論的支援は得られていない。滑らかで非凸な設定で、Polyak-{\L}ojasiewicz (PL) 条件を満たさずに、それらに対する初めての収束保証を提供する。また,PL条件下での雑音に対する適応性の良さを示す。第3に、運動量法の最後の反復について研究する。我々は、sgd の最後に一定の運動量を持つ反復数に対する凸設定の最初の下界を証明できる。さらに,モーメントの増大と更新の縮小を伴うフォローザ・ザ・レギュラライズド・リーダに基づくモーメントアルゴリズムのクラスについて検討した。最後に, 制約のない凸確率最適化問題に対する最適収束性を示す。

関連論文リスト

Directional Smoothness and Gradient Methods: Convergence and Adaptivity [16.779513676120096]
我々は、最適化の経路に沿った目的の条件付けに依存する勾配降下に対する新しい準最適境界を開発する。我々の証明の鍵となるのは方向の滑らかさであり、これは、目的の上のバウンドを開発するために使用する勾配変動の尺度である。我々は,方向の滑らかさの知識を使わずとも,ポリアクのステップサイズと正規化GDが高速で経路依存の速度を得ることを示した。
論文参考訳（メタデータ） (2024-03-06T22:24:05Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
勾配ランゲヴィン・ダイナミクスは非エプス最適化問題を解くための最も基本的なアルゴリズムの1つである。本稿では、このタイプの2つの変種、すなわち、分散還元ランジュバンダイナミクスと再帰勾配ランジュバンダイナミクスを示す。
論文参考訳（メタデータ） (2022-03-30T11:39:00Z)
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning [13.908826484332282]
最適化問題の解法として,新しい2段階勾配法を提案する。最初の貢献は、提案した2時間スケール勾配アルゴリズムの有限時間複雑性を特徴づけることである。我々は、強化学習における勾配に基づく政策評価アルゴリズムに適用する。
論文参考訳（メタデータ） (2021-09-29T23:15:23Z)
Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models [0.2741266294612776]
我々は高次元ランダム最小二乗問題に対して運動量を持つ勾配アルゴリズムのクラスを解析する。固定運動量パラメータを持つ(小バッチ)運動量では,ステップサイズを正確に調整した場合,SGDよりも実際の性能向上は得られないことを示す。非強凸条件では、運動量を用いてSGDよりも大きな改善が得られる。
論文参考訳（メタデータ） (2021-06-07T15:08:24Z)
Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
1次モーメントに対する大きな運動量パラメータの増大は適応的スケーリングに十分であることを示す。また,段階的に減少するステップサイズに応じて,段階的に運動量を増加させるための洞察を与える。
論文参考訳（メタデータ） (2021-04-30T08:50:24Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
問題に対処するための新しいアルゴリズム - Momentum- Single-timescale Approximation (MSTSA) を提案する。 MSTSAでは、低いレベルのサブプロブレムに対する不正確な解決策のため、反復でエラーを制御することができます。
論文参考訳（メタデータ） (2021-02-15T07:10:33Z)
A Unified Analysis of First-Order Methods for Smooth Games via Integral Quadratic Constraints [10.578409461429626]
本研究では、滑らかで強可変なゲームやイテレーションのための一階法に積分二次的制約理論を適用する。我々は、負の運動量法(NM)に対して、既知の下界と一致する複雑性$mathcalO(kappa1.5)$で、初めて大域収束率を与える。一段階のメモリを持つアルゴリズムでは,バッチ毎に1回だけ勾配を問合せすれば,高速化は不可能であることを示す。
論文参考訳（メタデータ） (2020-09-23T20:02:00Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
本稿では,各ステップごとに1つのデータポイントしか必要としない2つの単一スケールシングルループアルゴリズムを提案する。本研究の結果は, 同時一次および二重側収束の形で表される。
論文参考訳（メタデータ） (2020-08-23T20:36:49Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
本研究は,ステップサイズの減衰が正確な収束に必要であるという事実と,一定のステップサイズがエラーまでの時間でより速く学習するという事実のバランスをとることを目的とする。ステップサイズのミニバッチを最初から修正するのではなく,パラメータを適応的に進化させることを提案する。
論文参考訳（メタデータ） (2020-07-02T16:02:02Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
適応アルゴリズムは勾配の歴史を用いて勾配を更新し、深層ニューラルネットワークのトレーニングにおいてユビキタスである。本稿では,非コンケーブ最小値問題に対するOptimisticOAアルゴリズムの変種を解析する。実験の結果,適応型GAN非適応勾配アルゴリズムは経験的に観測可能であることがわかった。
論文参考訳（メタデータ） (2019-12-26T22:10:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。