Fugu-MT 論文翻訳(概要): Parameter-free Clipped Gradient Descent Meets Polyak

論文の概要: Parameter-free Clipped Gradient Descent Meets Polyak

arxiv url: http://arxiv.org/abs/2405.15010v2
Date: Thu, 31 Oct 2024 15:03:06 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-28 17:07:32.688777
Title: Parameter-free Clipped Gradient Descent Meets Polyak
Title（参考訳）: パラメタフリークリッピンググラディエント染料がポリアクと出会う
Authors: Yuki Takezawa, Han Bao, Ryoma Sato, Kenta Niwa, Makoto Yamada,
Abstract要約: 勾配降下とその変種は、機械学習モデルをトレーニングするためのデファクト標準アルゴリズムである。 Inexact Polyak Stepsizeを提案し、これはハイパーパラメータチューニングなしで最適解に収束する。合成関数を用いて収束結果を数値的に検証し,提案手法の有効性を実証した。
参考スコア（独自算出の注目度）: 29.764853985834403
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparameters, we need to tune the hyperparameters carefully using a grid search. However, the method is time-consuming, particularly when multiple hyperparameters exist. Therefore, recent studies have analyzed parameter-free methods that adjust the hyperparameters on the fly. However, the existing work is limited to investigations of parameter-free methods for the stepsize, and parameter-free methods for other hyperparameters have not been explored. For instance, although the gradient clipping threshold is a crucial hyperparameter in addition to the stepsize for preventing gradient explosion issues, none of the existing studies have investigated parameter-free methods for clipped gradient descent. Therefore, in this study, we investigate the parameter-free methods for clipped gradient descent. Specifically, we propose Inexact Polyak Stepsize, which converges to the optimal solution without any hyperparameters tuning, and its convergence rate is asymptotically independent of $L$ under $L$-smooth and $(L_0, L_1)$-smooth assumptions of the loss function, similar to that of clipped gradient descent with well-tuned hyperparameters. We numerically validated our convergence results using a synthetic function and demonstrated the effectiveness of our proposed methods using LSTM, Nano-GPT, and T5.
Abstract（参考訳）: 勾配降下とその変種は、機械学習モデルをトレーニングするためのデファクト標準アルゴリズムである。勾配降下はハイパーパラメータに敏感であるため、格子探索を用いてハイパーパラメータを注意深く調整する必要がある。しかし、この方法は特に複数のハイパーパラメータが存在する場合、時間を要する。そのため、最近の研究では、ハエのハイパーパラメータを調整するパラメータフリーな手法が分析されている。しかし、既存の研究は段差のパラメータフリー法の研究に限られており、他のハイパーパラメーターに対するパラメータフリー法は検討されていない。例えば、勾配クリッピング閾値は、勾配爆発の防止のための段差に加えて重要なハイパーパラメータであるが、既存の研究では、クリッピング勾配降下のためのパラメータフリーな手法は検討されていない。そこで本研究では,クリッピング勾配降下に対するパラメータフリー手法について検討した。具体的には、過度パラメータチューニングを伴わない最適解に収束するInexact Polyak Stepsizeを提案し、その収束率は、よく調整されたハイパーパラメータによるクリッピング勾配降下と同様、損失関数の$L$-smooth と$(L_0, L_1)$-smooth とは漸近的に独立である。合成関数を用いて収束結果を数値的に検証し,LSTM,Nano-GPT,T5を用いて提案手法の有効性を実証した。

論文の概要: Parameter-free Clipped Gradient Descent Meets Polyak

関連論文リスト