Fugu-MT 論文翻訳(概要): Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss

論文の概要: Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss

arxiv url: http://arxiv.org/abs/2606.06418v1
Date: Thu, 04 Jun 2026 17:22:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 22:39:44.993504
Title: Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss
Title（参考訳）: ダブルプレコンディショニング(DoPr): 検証損失ではなくテスト時間性能の最適化
Authors: Thomas T. Zhang, Alok Shah, Yifei Zhang, Vincent Zhang, Nikolai Matni, Max Simchowitz,
Abstract要約: ダブルプレコンディショニング(DoPr)と呼ばれる新しい最適化パラダイムを導入する。 DoPr は、Adam や Muon のようにグラデーションワイドプレコンディショニングとアクティベーションワイドプレコンディショニング(AP)を組み合わせている。我々は、APの追加により、様々なテスト時間設定でダウンストリームモデルの性能を向上させるために、ドロップインの介入が得られることを示す。
参考スコア（独自算出の注目度）: 26.33868416147844
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many modern applications of deep learning involve training a neural network via a one-step prediction loss (e.g., $L^2$ regression, cross-entropy), but deploy the network by rolling out along its own predictions. Key examples include autoregressive language modeling, flow-based generative modeling, and robot policy learning. It is well-documented that these settings induce a phenomenon we call test-time feedback (TTF): the mismatch between the training/validation loss and downstream metrics of interest, such as task success rate and generation quality, which grows with task length. While data curation, architecture, and objective design have been proposed to combat train-test shift in TTF settings, this paper proposes optimization as a new design axis to mitigate error accumulation. Specifically, we introduce a new optimization paradigm called double-preconditioning (DoPr) uniquely tailored to the challenges of TTF. DoPr combines gradient-wise preconditioning, as in Adam and Muon, with activation-wise preconditioning (AP), such as in KFAC. We show that the addition of AP yields a drop-in intervention for increasing downstream model performance across a range of TTF settings. Interestingly, these gains in test-time performance do not consistently accompany improvements in validation loss, opening new questions about how to properly evaluate models trained with one-step supervised objectives.
Abstract（参考訳）: ディープラーニングの現代的な応用の多くは、1ステップの予測損失(例えば、$L^2$回帰、クロスエントロピー)を通じてニューラルネットワークをトレーニングするが、独自の予測に沿ってロールアウトすることでネットワークをデプロイする。主な例としては、自動回帰言語モデリング、フローベース生成モデリング、ロボットポリシー学習などがある。これらの設定がテストタイムフィードバック(TTF)と呼ばれる現象を引き起こすことは、よく文書化されています。これは、トレーニング/検証損失と、タスクの成功率や生成品質といった、タスクの長さとともに成長する、関心の下流メトリクスのミスマッチです。データキュレーション, アーキテクチャ, 客観的設計は, TTF設定における列車試験のシフトに対処するために提案されているが, 本論文では誤りの蓄積を緩和するための新しい設計軸として, 最適化を提案する。具体的には、TTFの課題に合わせて、Double-preconditioning (DoPr)と呼ばれる新しい最適化パラダイムを導入する。 DoPr は、Adam や Muon のような勾配方向の事前条件と KFAC のような活性化方向の事前条件(AP)を組み合わせる。我々は、APの追加により、TTF設定の範囲でダウンストリームモデルの性能を向上させるために、ドロップインの介入が生じることを示す。興味深いことに、これらのテストタイムのパフォーマンス向上は、検証損失の改善を継続的に伴わない。

論文の概要: Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss

関連論文リスト