Fugu-MT 論文翻訳(概要): Dynamics of Learning: Generative Schedules from Latent ODEs

論文の概要: Dynamics of Learning: Generative Schedules from Latent ODEs

arxiv url: http://arxiv.org/abs/2509.23052v1
Date: Sat, 27 Sep 2025 02:20:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.010614
Title: Dynamics of Learning: Generative Schedules from Latent ODEs
Title（参考訳）: 学習のダイナミクス:潜在するODEからの生成スケジュール
Authors: Matt L. Sampson, Peter Melchior,
Abstract要約: ニューラルネットワークのトレーニング性能を動的システムとしてモデル化する新しい学習率スケジューラを提案する。本手法は計算効率が高く,一般化に依存しないものであり,ML実験追跡プラットフォーム上に容易に階層化することができる。
参考スコア（独自算出の注目度）: 0.14323566945483496
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The learning rate schedule is one of the most impactful aspects of neural network optimization, yet most schedules either follow simple parametric functions or react only to short-term training signals. None of them are supported by a comprehensive temporal view of how well neural networks actually train. We present a new learning rate scheduler that models the training performance of neural networks as a dynamical system. It leverages training runs from a hyperparameter search to learn a latent representation of the training process. Given current training metrics, it predicts the future learning rate schedule with the best long-term validation performance. Our scheduler generalizes beyond previously observed training dynamics and creates specialized schedules that deviate noticeably from common parametric functions. It achieves SOTA results for image classification with CNN and ResNet models as well as for next-token prediction with a transformer model. The trained models are located in flatter regions of the loss landscape and thus provide better generalization than those trained with other schedules. Our method is computationally efficient, optimizer-agnostic, and can easily be layered on top of ML experiment-tracking platforms. An implementation of our scheduler will be made available after acceptance.
Abstract（参考訳）: 学習率スケジュールはニューラルネットワーク最適化の最も影響のある側面の1つであるが、ほとんどのスケジュールは単純なパラメトリック関数に従うか、短期的なトレーニング信号にのみ反応する。いずれも、ニューラルネットワークが実際にどのようにトレーニングされているかという、包括的な時間的視点では、サポートされていない。ニューラルネットワークのトレーニング性能を動的システムとしてモデル化する新しい学習率スケジューラを提案する。ハイパーパラメータ検索からのトレーニングの実行を活用して、トレーニングプロセスの潜在表現を学習する。現在のトレーニング指標から、将来の学習率スケジュールを、最高の長期検証パフォーマンスで予測する。我々のスケジューラは、以前に観測されたトレーニング力学を超越して一般化し、一般的なパラメトリック関数から顕著に逸脱する特別なスケジュールを作成する。 CNNおよびResNetモデルによる画像分類のSOTA結果と、トランスフォーマーモデルによる次点予測を実現する。トレーニングされたモデルは、損失ランドスケープの平坦な領域に位置しており、他のスケジュールでトレーニングされたモデルよりも優れた一般化を提供する。提案手法は計算効率が高く,最適化が不要であり,ML実験追跡プラットフォーム上に容易に階層化することができる。承認後、スケジューラの実装を利用可能にします。

論文の概要: Dynamics of Learning: Generative Schedules from Latent ODEs

関連論文リスト