Fugu-MT 論文翻訳(概要): Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States

論文の概要: Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States

arxiv url: http://arxiv.org/abs/2211.10691v1
Date: Sat, 19 Nov 2022 13:15:39 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-22 23:32:21.154012
Title: Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Title（参考訳）: 情報理論レンズによるSDEの2面:訓練軌道と終端状態によるSGDの一般化
Authors: Ziqiao Wang and Yongyi Mao
Abstract要約: SDE近似は、SGDを用いた学習機械学習モデルの力学をうまく特徴付けることが示されている。 SDEの定常重量分布の推定値が得られる。これらの境界の開発について様々な知見が提示され、それが後に検証される。
参考スコア（独自算出の注目度）: 27.14107452619853
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic differential equations (SDEs) have been shown recently to well characterize the dynamics of training machine learning models with SGD. This provides two opportunities for better understanding the generalization behaviour of SGD through its SDE approximation. First, under the SDE characterization, SGD may be regarded as the full-batch gradient descent with Gaussian gradient noise. This allows the application of the generalization bounds developed by Xu & Raginsky (2017) to analyzing the generalization behaviour of SGD, resulting in upper bounds in terms of the mutual information between the training set and the training trajectory. Second, under mild assumptions, it is possible to obtain an estimate of the steady-state weight distribution of SDE. Using this estimate, we apply the PAC-Bayes-like information-theoretic bounds developed in both Xu & Raginsky (2017) and Negrea et al. (2019) to obtain generalization upper bounds in terms of the KL divergence between the steady-state weight distribution of SGD with respect to a prior distribution. Among various options, one may choose the prior as the steady-state weight distribution obtained by SGD on the same training set but with one example held out. In this case, the bound can be elegantly expressed using the influence function (Koh & Liang, 2017), which suggests that the generalization of the SGD is related to the stability of SGD. Various insights are presented along the development of these bounds, which are subsequently validated numerically.
Abstract（参考訳）: 確率微分方程式(SDE)は、SGDを用いた学習機械学習モデルの力学をうまく特徴づけるために最近示されている。これにより、SDE近似を通じてSGDの一般化挙動をよりよく理解する2つの機会が得られる。第一に、SDE特性下では、SGDはガウス勾配雑音を伴うフルバッチ勾配勾配とみなすことができる。これにより、Xu & Raginsky (2017) によって開発された一般化境界を SGD の一般化挙動の解析に適用することができ、トレーニングセットとトレーニング軌道の間の相互情報の観点からの上界が得られる。第二に、軽度の仮定では、SDEの定常重量分布の推定値が得られる。この推定値を用いて、Xu & Raginsky (2017) と Negrea et al. (2019) で開発された PAC-Bayes のような情報理論境界を適用し、SGD の定常重み分布間の KL 分布の一般化の上界を求める。様々な選択肢の中で、SGDが同じトレーニングセットで得た定常状態の重量分布として事前を選択することができるが、1つの例が示される。この場合、境界は影響関数(Koh & Liang, 2017)を用いてエレガントに表現することができ、これはSGDの一般化がSGDの安定性と関連していることを示唆している。これらの境界の発展にともなって様々な知見が提示され、その後数値的に検証される。

関連論文リスト

Algorithmic Stability of Stochastic Gradient Descent with Momentum under Heavy-Tailed Noise [20.922456964393213]
重み付き雑音下での運動量(SGDm)を持つSGDの一般化境界を確立する。二次損失関数に対して、SGDm は運動量や重みのあるテールの存在においてより悪い一般化を持つことを示す。我々の知る限り、縮退した雑音を持つSDEにおいて、その種類の最初の結果となる一様時間離散化誤差境界を開発する。
論文参考訳（メタデータ） (2025-02-02T19:25:48Z)
Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation [1.8416014644193066]
重み付きSDEに対して、非自明な情報理論項を伴わない高確率境界一般化を証明した。以上の結果から,重尾は問題構造によって有益か有害かのどちらかである可能性が示唆された。
論文参考訳（メタデータ） (2024-02-12T15:35:32Z)
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction [49.66486092259376]
平均場ランゲヴィンダイナミクス(英: mean-field Langevin dynamics、MFLD)は、分布依存のドリフトを含むランゲヴィン力学の非線形一般化である。近年の研究では、MFLDは測度空間で機能するエントロピー規則化された凸関数を地球規模で最小化することが示されている。有限粒子近似,時間分散,勾配近似による誤差を考慮し,MFLDのカオスの均一時間伝播を示す枠組みを提供する。
論文参考訳（メタデータ） (2023-06-12T16:28:11Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
本稿では,アルゴリズム安定性の概念を活用して,浅層ニューラルネットワーク(SNN)の一般化挙動について検討する。我々は、SNNを訓練するために勾配降下(GD)と勾配降下(SGD)を考慮する。
論文参考訳（メタデータ） (2022-09-19T18:48:00Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
オンライン相関解析の問題から,emphStochastic Scaled-Gradient Descent (SSD)アルゴリズムを提案する。我々はこれらのアイデアをオンライン相関解析に適用し、局所収束率を正規性に比例した最適な1時間スケールのアルゴリズムを初めて導いた。
論文参考訳（メタデータ） (2021-12-29T18:46:52Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
勾配降下(SGD)により最適化された高次元におけるランダム特徴(RF)回帰特性について検討する。本研究では, RF回帰の高精度な非漸近誤差境界を, 定常および適応的なステップサイズSGD設定の下で導出する。理論的にも経験的にも二重降下現象を観察する。
論文参考訳（メタデータ） (2021-10-13T17:47:39Z)
Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo [60.785586069299356]
この研究は、2-ワッサーシュタイン距離におけるサンプリング誤差の非同相解析のための一般的な枠組みを提供する。我々の理論解析は数値実験によってさらに検証される。
論文参考訳（メタデータ） (2021-09-08T18:00:05Z)
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) [31.938587263846635]
有限学習率 (LR) は実生活深層ネットのよい一般化に重要であると一般に認識されている。有限LR SGD を Ito Differential Equations (SDEs) で近似することを提案する。本論文は、以下の貢献により、この図を明確にする。
論文参考訳（メタデータ） (2021-02-24T18:55:00Z)
Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks [27.54155197562196]
勾配降下(SGD)の軌跡は,emphFeller法によりよく近似できることを示す。このような一般化の成功を測る「容量メートル法」を提案する。
論文参考訳（メタデータ） (2020-06-16T16:57:12Z)
Convergence rates and approximation results for SGD and its continuous-time counterpart [16.70533901524849]
本稿では,非増加ステップサイズを有する凸勾配Descent (SGD) の完全理論的解析を提案する。まず、結合を用いた不均一微分方程式(SDE)の解により、SGDを確実に近似できることを示す。連続的手法による決定論的および最適化手法の最近の分析において, 連続過程の長期的挙動と非漸近的境界について検討する。
論文参考訳（メタデータ） (2020-04-08T18:31:34Z)
Stochastic Normalizing Flows [52.92110730286403]
微分方程式(SDE)を用いた最大推定と変分推論のための正規化フロー(VI)を導入する。粗い経路の理論を用いて、基礎となるブラウン運動は潜在変数として扱われ、近似され、神経SDEの効率的な訓練を可能にする。これらのSDEは、与えられたデータセットの基盤となる分布からサンプリングする効率的なチェーンを構築するために使用することができる。
論文参考訳（メタデータ） (2020-02-21T20:47:55Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。