Fugu-MT 論文翻訳(概要): Fast Generalization after Interpolation via Critically Damped Momentum Optimization

論文の概要: Fast Generalization after Interpolation via Critically Damped Momentum Optimization

arxiv url: http://arxiv.org/abs/2606.01521v1
Date: Mon, 01 Jun 2026 00:54:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:29.762948
Title: Fast Generalization after Interpolation via Critically Damped Momentum Optimization
Title（参考訳）: 臨界減衰モーメント最適化による補間後の高速一般化
Authors: Luca Muscarnera, Silas Ruhrberg Estévez, Yuanzhang Xiao, Mihaela Van der Schaar,
Abstract要約: GROKtimizerは低ノルム補間解を選択するための自然な解であることを示す。 GROKtimizer は古典的な勾配勾配よりも2次的なスピードアップを提供し、一階一般化の中で証明可能な最適性を提供する。我々は,高品位一般化モデルの構築における補間後のダイナミクスの重要性を強調して,フラット・ミニマ仮説を用いて解析を行った。
参考スコア（独自算出の注目度）: 44.00737032715565
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A central problem in machine learning is that models can achieve near-perfect training performance while generalizing substantially less well to unseen examples. This gap is especially acute in high-dimensional, low-sample regimes, where many interpolating solutions exist and optimization must implicitly select among minima with different generalization properties. Following recent theoretical advances on optimization dynamics near the interpolation threshold, we note that the two-regime structure of risk minimization, with loss minimization followed by complexity minimization, motivates a biphasic optimization schedule. We thus theoretically demonstrate that GROKtimizer, a biphasic strategy that combines rapid convergence to interpolation with Critically Damped Momentum (CDM)-based post-interpolation norm minimization, offers a natural solution for selecting low-norm interpolating solutions. Under a local quadratic model of the post-interpolation basin, GROKtimizer provides a quadratic speedup over classical gradient descent, with provable optimality among first-order optimizers. To showcase the applicability of our method, we evaluate GROKtimizer on several synthetic benchmarks common in the classical grokking literature and on various real-world datasets. Finally, we reconcile our findings with the flat-minima hypothesis, highlighting the importance of post-interpolation dynamics in the construction of high-quality, generalizing models.
Abstract（参考訳）: 機械学習における中心的な問題は、モデルがほとんど完璧に近いトレーニング性能を達成できると同時に、目に見えない例に対してかなりうまく一般化できないことである。このギャップは、多くの補間解が存在し、最適化は、異なる一般化特性を持つミニマの中から暗黙的に選択する必要がある、高次元、低サンプルな状態において特に急性である。補間しきい値付近の最適化力学の最近の理論的進歩に続いて、損失最小化と複雑性最小化を伴い、二相最適化スケジュールの動機となるリスク最小化の2つのレジム構造について述べる。そこで理論的には、高速収束と補間と臨界減衰モメンタム(CDM)に基づく補間後ノルム最小化を組み合わせた二相的戦略であるGROKtimizerが、低ノルム補間解を選択する自然な解であることを示す。補間後の盆地の局所的な二次モデルの下では、GROKtimizerは古典的な勾配勾配よりも2次的なスピードアップを提供する。本手法の適用性を示すため,古典的なグルーキング文学や様々な実世界のデータセットに共通するいくつかの合成ベンチマークにおいて,GROKtimizerを評価した。最後に,実験結果とフラットミニマ仮説を照合し,高品質な一般化モデルの構築における補間後のダイナミクスの重要性を強調した。

論文の概要: Fast Generalization after Interpolation via Critically Damped Momentum Optimization

関連論文リスト