Fugu-MT 論文翻訳(概要): What do near-optimal learning rate schedules look like?

論文の概要: What do near-optimal learning rate schedules look like?

arxiv url: http://arxiv.org/abs/2603.10301v1
Date: Wed, 11 Mar 2026 00:53:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.736787
Title: What do near-optimal learning rate schedules look like?
Title（参考訳）: ほぼ最適学習率のスケジュールはどのようなものか?
Authors: Hiroki Naganuma, Atish Agarwala, Priya Kasimbeg, George E. Dahl,
Abstract要約: パラメータ化されたスケジュールファミリ内で最適な形状を求めるための探索手順を設計する。この結果は、ニューラルネットワークの深層学習における、ほぼ最適スケジュール形状に関する最も包括的な結果を示す。
参考スコア（独自算出の注目度）: 10.511909112011834
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A basic unanswered question in neural network training is: what is the best learning rate schedule shape for a given workload? The choice of learning rate schedule is a key factor in the success or failure of the training process, but beyond having some kind of warmup and decay, there is no consensus on what makes a good schedule shape. To answer this question, we designed a search procedure to find the best shapes within a parameterized schedule family. Our approach factors out the schedule shape from the base learning rate, which otherwise would dominate cross-schedule comparisons. We applied our search procedure to a variety of schedule families on three workloads: linear regression, image classification on CIFAR-10, and small-scale language modeling on Wikitext103. We showed that our search procedure indeed generally found near-optimal schedules. We found that warmup and decay are robust features of good schedules, and that commonly used schedule families are not optimal on these workloads. Finally, we explored how the outputs of our shape search depend on other optimization hyperparameters, and found that weight decay can have a strong effect on the optimal schedule shape. To the best of our knowledge, our results represent the most comprehensive results on near-optimal schedule shapes for deep neural network training, to date.
Abstract（参考訳）: ニューラルネットワークトレーニングにおける基本的な疑問は、与えられたワークロードに最適な学習率スケジュール形状は何か、ということだ。学習率のスケジュールの選択は、トレーニングプロセスの成功または失敗の重要な要因であるが、ある種のウォームアップと崩壊の他に、何が良いスケジュールになるかについてのコンセンサスはない。そこで本研究では,パラメータ化されたスケジュールファミリ内の最適な形状を見つけるための探索手順を考案した。提案手法は,基本学習率からスケジュール形状を導出し,それ以外はスケジュール間比較が支配的となる。我々は,線形回帰,CIFAR-10の画像分類,Wikitext103での小規模言語モデリングの3つの作業に対して,探索手順を各種スケジュールファミリに適用した。その結果,検索手順は概ねほぼ最適であることがわかった。ウォームアップと崩壊は良いスケジュールの堅牢な特徴であり、一般的に使用されるスケジュールファミリはこれらのワークロードでは最適ではないことがわかった。最後に、形状探索の出力が他の最適化ハイパーパラメータにどのように依存するかを調べた結果、重量減衰が最適スケジュール形状に強い影響を与えることが判明した。我々の知る限りでは、我々の結果は、現在までのディープニューラルネットワークトレーニングにおいて、ほぼ最適なスケジュール形状に関する最も包括的な結果を表している。

論文の概要: What do near-optimal learning rate schedules look like?

関連論文リスト