Fugu-MT 論文翻訳(概要): On the Maximum Hessian Eigenvalue and Generalization

論文の概要: On the Maximum Hessian Eigenvalue and Generalization

arxiv url: http://arxiv.org/abs/2206.10654v3
Date: Tue, 23 May 2023 21:04:12 GMT
ステータス: 翻訳完了
システム内更新日: 2023-05-26 03:21:42.003946
Title: On the Maximum Hessian Eigenvalue and Generalization
Title（参考訳）: 最大ヘッセン固有値と一般化について
Authors: Simran Kaur, Jeremy Cohen, Zachary C. Lipton
Abstract要約: より大きい学習率では全てのバッチサイズに対してlambda_max$が削減されるが、より大きなバッチサイズでは一般化のメリットがなくなることがある。バッチ正規化は、常により小さな$lambda_max$を生成するわけではないが、しかしながら、一般化の利点を提供する。
参考スコア（独自算出の注目度）: 23.408289280656412
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The mechanisms by which certain training interventions, such as increasing learning rates and applying batch normalization, improve the generalization of deep networks remains a mystery. Prior works have speculated that "flatter" solutions generalize better than "sharper" solutions to unseen data, motivating several metrics for measuring flatness (particularly $\lambda_{max}$, the largest eigenvalue of the Hessian of the loss); and algorithms, such as Sharpness-Aware Minimization (SAM) [1], that directly optimize for flatness. Other works question the link between $\lambda_{max}$ and generalization. In this paper, we present findings that call $\lambda_{max}$'s influence on generalization further into question. We show that: (1) while larger learning rates reduce $\lambda_{max}$ for all batch sizes, generalization benefits sometimes vanish at larger batch sizes; (2) by scaling batch size and learning rate simultaneously, we can change $\lambda_{max}$ without affecting generalization; (3) while SAM produces smaller $\lambda_{max}$ for all batch sizes, generalization benefits (also) vanish with larger batch sizes; (4) for dropout, excessively high dropout probabilities can degrade generalization, even as they promote smaller $\lambda_{max}$; and (5) while batch-normalization does not consistently produce smaller $\lambda_{max}$, it nevertheless confers generalization benefits. While our experiments affirm the generalization benefits of large learning rates and SAM for minibatch SGD, the GD-SGD discrepancy demonstrates limits to $\lambda_{max}$'s ability to explain generalization in neural networks.
Abstract（参考訳）: 学習率の増加やバッチ正規化の適用など、特定の訓練介入が深層ネットワークの一般化を改善するメカニズムは謎のままである。以前の研究では、"flatter" の解は、平らさを測定するためのいくつかの指標(特に損失のヘッセンの最大の固有値である$\lambda_{max}$)と、平坦さを直接最適化する sharpness-aware minimization (sam) [1] のようなアルゴリズムを動機付けて、目に見えないデータに対する "sharper" の解よりも一般化していると推測されている。他の作品では$\lambda_{max}$ と一般化の関係に疑問がある。本稿では, 一般化に対する$\lambda_{max}$の影響を更に疑問視する知見を提示する。 We show that: (1) while larger learning rates reduce $\lambda_{max}$ for all batch sizes, generalization benefits sometimes vanish at larger batch sizes; (2) by scaling batch size and learning rate simultaneously, we can change $\lambda_{max}$ without affecting generalization; (3) while SAM produces smaller $\lambda_{max}$ for all batch sizes, generalization benefits (also) vanish with larger batch sizes; (4) for dropout, excessively high dropout probabilities can degrade generalization, even as they promote smaller $\lambda_{max}$; and (5) while batch-normalization does not consistently produce smaller $\lambda_{max}$, it nevertheless confers generalization benefits. 実験では,大規模学習率とsamによるミニバッチsgdの一般化効果を肯定する一方で,gd-sgdの不一致は,ニューラルネットワークの一般化を説明するための$\lambda_{max}$の限界を示す。

関連論文リスト

Compute-Optimal LLMs Provably Generalize Better With Scale [102.29926217670926]
我々は,大規模言語モデル(LLM)の事前学習目標に基づく一般化境界を開発する。損失関数の分散を考慮し, 既存の境界を緩める, 完全経験的フリードマン型マルティンゲール濃度を導入する。我々は一般化ギャップのスケーリング法則を作成し、その境界はスケールによって予測的に強くなる。
論文参考訳（メタデータ） (2025-04-21T16:26:56Z)
Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often! [15.730667464815548]
一般化誤差の進化を、K$クライアントとパラメータサーバ間の通信ラウンド数$R$で調べる。 PAC-Bayes and rate-distortiontheoretic bounds on the generalization error that account on the effect of the numbers $R$。 FSVMの一般化限界は$R$で増加し、PSとのより頻繁な通信が一般化力を低下させることを示す。
論文参考訳（メタデータ） (2023-06-09T12:53:24Z)
Topology-aware Generalization of Decentralized SGD [89.25765221779288]
本稿では,分散型Valpha-10安定降下(D-SGD)の一般化可能性について検討する。 D-SGDの一般化性は、初期訓練段階における接続性と正の相関があることを証明した。
論文参考訳（メタデータ） (2022-06-25T16:03:48Z)
$p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets [74.37849422071206]
本稿では, 2次応答に対する一般化線形モデルである,$p$一般化プロビット回帰モデルについて検討する。 p$の一般化されたプロビット回帰に対する最大可能性推定器は、大容量データ上で$(1+varepsilon)$の係数まで効率的に近似できることを示す。
論文参考訳（メタデータ） (2022-03-25T10:54:41Z)
Black-Box Generalization [31.80268332522017]
微分一般化によるブラックボックス学習のための最初の誤り解析を行う。どちらの一般化も独立$d$,$K$であり、適切な選択の下では学習率がわずかに低下していることを示す。
論文参考訳（メタデータ） (2022-02-14T17:14:48Z)
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect [43.00475513526005]
等質行列因数分解問題に対して,学習率の高いグラディエントDescent (GD) を用いることを検討する。一定の大規模学習率に対する収束理論を2/L$以上で証明する。我々はこのような大きな学習率によって引き起こされるGDの暗黙の偏見を厳格に確立し、「バランス」という。
論文参考訳（メタデータ） (2021-10-07T17:58:21Z)
Generalization of GANs under Lipschitz continuity and data augmentation [2.474754293747645]
generative adversarial network (gans) は様々な用途で広く使われている。 GANの一般化に関する総合的な分析を行います。
論文参考訳（メタデータ） (2021-04-06T09:24:10Z)
Correcting Momentum with Second-order Information [50.992629498861724]
最適積に$O(epsilon)$epsilon点を求める非臨界最適化のための新しいアルゴリズムを開発した。我々は、さまざまな大規模ディープラーニングベンチマークとアーキテクチャで結果を検証する。
論文参考訳（メタデータ） (2021-03-04T19:01:20Z)
Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
HSDMPGは、学習モデル上で過大なエラーの順序である$mathcalObig(1/sttnbig)$を達成可能であることを示す。損失係数について、HSDMPGは学習モデル上で過大なエラーの順序である$mathcalObig(1/sttnbig)$を達成できることを示す。
論文参考訳（メタデータ） (2020-09-18T02:18:44Z)
Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization [101.5159744660701]
分散第2次最適化において、標準的な戦略は、データの小さなスケッチやバッチに基づいて、多くの局所的な見積もりを平均化することである。本稿では,分散二階法における収束率の理論的および実証的改善を両立させるため,局所的な推定を嫌悪する新しい手法を提案する。
論文参考訳（メタデータ） (2020-07-02T18:08:14Z)
Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization [37.57922952189396]
高次元状態における標準分類器の一般化性能について検討する。ベイズ最適一般化誤差を確実に導く最適損失と正則化器を設計する。
論文参考訳（メタデータ） (2020-06-11T16:14:51Z)
Learning Near Optimal Policies with Low Inherent Bellman Error [115.16037976819331]
エピソード強化学習における近似線形作用値関数を用いた探索問題について検討する。我々は,検討した設定に対して最適な統計率を達成するアルゴリズムを用いて,Emphbatch仮定のみを用いて探索を行うことが可能であることを示す。
論文参考訳（メタデータ） (2020-02-29T02:02:40Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。