Fugu-MT 論文翻訳(概要): Convergence Properties of Natural Gradient Descent for Minimizing KL Divergence

論文の概要: Convergence Properties of Natural Gradient Descent for Minimizing KL Divergence

arxiv url: http://arxiv.org/abs/2504.19259v1
Date: Sun, 27 Apr 2025 14:39:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-02 19:15:54.203443
Title: Convergence Properties of Natural Gradient Descent for Minimizing KL Divergence
Title（参考訳）: KL分散最小化のための天然グラディエント蛍光の収束特性
Authors: Adwait Datar, Nihat Ay,
Abstract要約: クルバック・リーブラー(KL)の発散を最小化する問題について検討する。 2つの双対座標系の下での勾配に基づく最適化アルゴリズムの挙動を解析する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Kullback-Leibler (KL) divergence plays a central role in probabilistic machine learning, where it commonly serves as the canonical loss function. Optimization in such settings is often performed over the probability simplex, where the choice of parameterization significantly impacts convergence. In this work, we study the problem of minimizing the KL divergence and analyze the behavior of gradient-based optimization algorithms under two dual coordinate systems within the framework of information geometry$-$ the exponential family ($\theta$ coordinates) and the mixture family ($\eta$ coordinates). We compare Euclidean gradient descent (GD) in these coordinates with the coordinate-invariant natural gradient descent (NGD), where the natural gradient is a Riemannian gradient that incorporates the intrinsic geometry of the parameter space. In continuous time, we prove that the convergence rates of GD in the $\theta$ and $\eta$ coordinates provide lower and upper bounds, respectively, on the convergence rate of NGD. Moreover, under affine reparameterizations of the dual coordinates, the convergence rates of GD in $\eta$ and $\theta$ coordinates can be scaled to $2c$ and $\frac{2}{c}$, respectively, for any $c>0$, while NGD maintains a fixed convergence rate of $2$, remaining invariant to such transformations and sandwiched between them. Although this suggests that NGD may not exhibit uniformly superior convergence in continuous time, we demonstrate that its advantages become pronounced in discrete time, where it achieves faster convergence and greater robustness to noise, outperforming GD. Our analysis hinges on bounding the spectrum and condition number of the Hessian of the KL divergence at the optimum, which coincides with the Fisher information matrix.
Abstract（参考訳）: Kullback-Leibler(KL)の発散は確率論的機械学習において中心的な役割を果たす。このような設定における最適化はしばしば、パラメータ化の選択が収束に大きく影響する確率的単純度上で実行される。本研究では,KLの発散を最小限に抑え,情報幾何学の枠組みにおける2つの双対座標系の下での勾配に基づく最適化アルゴリズムの挙動を解析する問題を,指数族(\theta$ coordinates)と混合族(\eta$ coordinates)を用いて検討する。これらの座標におけるユークリッド勾配勾配(GD)と座標不変な自然勾配勾配(NGD)を比較し、自然勾配はパラメータ空間の内在幾何学を含むリーマン勾配である。連続時間において、$\theta$ と $\eta$ の座標における GD の収束率は、それぞれ NGD の収束率に基づいて下界と上界を与えることを証明する。さらに、双対座標のアフィン再パラメータ化の下では、GD の$\eta$ と $\theta$ の収束率は、任意の$c>0$に対してそれぞれ 2c$ と $\frac{2}{c}$ にスケールでき、NGD は、そのような変換に不変であり、それらの間に挟まれた固定収束速度を維持できる。このことは、NGDが連続時間において一様に優れた収束を示すわけではないことを示唆するが、その優位性は離散時間で顕著になり、より高速な収束とノイズに対する強靭性を実現し、GDより優れることを示す。解析は, フィッシャー情報行列と一致する最適KL発散のヘシアンスペクトルと条件数との有界性に注目した。

関連論文リスト

Convergence of two-timescale gradient descent ascent dynamics: finite-dimensional and mean-field perspectives [6.740173664466834]
2時間勾配勾配勾配アルゴリズム(GDA)は、min-maxゲームにおいてナッシュ平衡を求めるために設計された標準勾配アルゴリズムである。学習速度比が有限次元および平均場設定の収束挙動に及ぼす影響について検討した。
論文参考訳（メタデータ） (2025-01-28T18:13:41Z)
Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions [18.47705532817026]
AdaGradは特定の条件下では$d$でSGDより優れていることを示す。これを動機として、目的物の滑らかさ構造と勾配のばらつきを仮定する。
論文参考訳（メタデータ） (2024-06-07T02:55:57Z)
Convergence of coordinate ascent variational inference for log-concave measures via optimal transport [0.0]
平均場推論 (VI) は、最も近い積(分解された)測度を求める問題である。良く知られたアセンセント変分推論(CAVI)は、この近似測度を1つの座標上の変分によって求めるものである。
論文参考訳（メタデータ） (2024-04-12T19:43:54Z)
Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization [108.35402316802765]
本稿では,新しい一階最適化アルゴリズムであるAcceleratedGradient-OptimisticGradient (AG-OG) Ascentを提案する。我々はAG-OGが様々な設定に対して最適収束率(定数まで)を達成することを示す。アルゴリズムを拡張して設定を拡張し、bi-SC-SCとbi-C-SCの両方で最適な収束率を達成する。
論文参考訳（メタデータ） (2022-10-31T17:59:29Z)
NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer [45.47667026025716]
2つの重要な要素に依存した、新しく、堅牢で、加速された反復を提案する。 NAG-GSと呼ばれる手法の収束と安定性は、まず広範に研究されている。我々は、NAG-arityが、重量減衰を伴う運動量SGDや機械学習モデルのトレーニングのためのAdamWといった最先端の手法と競合していることを示す。
論文参考訳（メタデータ） (2022-09-29T16:54:53Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
本稿では, ステップサイズが一定であるSEG法の解析を行い, 良好な収束をもたらす手法のバリエーションを示す。平均化で拡張した場合、SEGはナッシュ平衡に確実に収束し、スケジュールされた再起動手順を組み込むことで、その速度が確実に加速されることを証明した。
論文参考訳（メタデータ） (2021-06-30T17:51:36Z)
Robust Training in High Dimensions via Block Coordinate Geometric Median Descent [69.47594803719333]
幾何学的中央値 (textGm) は、未破損データのロバストな推定を達成するための統計学における古典的な方法である。本稿では,テキストscGmを一度に選択した座標ブロックにのみ適用することにより,スムーズな非テキスト問題に対して0.5の分解点を保持することができることを示す。
論文参考訳（メタデータ） (2021-06-16T15:55:50Z)
Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods [12.173568611144626]
騒音やステップによって1次サドル勾配降下法(SCSG)が摂動可能であることを示す。この問題を解決するために、別のステップが提案される。提案手法は,サドル点に対するCNC-SCSGD法をさらに取り入れることを目的としている。
論文参考訳（メタデータ） (2021-03-07T18:09:43Z)
A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization [0.0]
減少勾配(SVRG)の性能を向上させるために, 分散制御勾配(VCSG)という新しい手法を提案する。ラムダ$はVCSGで導入され、SVRGによる分散の過剰還元を避ける。 $mathcalO(min1/epsilon3/2,n1/4/epsilon)$ 勾配評価の数。
論文参考訳（メタデータ） (2021-02-19T12:22:56Z)
Proximal Gradient Descent-Ascent: Variable Convergence under K{\L} Geometry [49.65455534654459]
有限降下指数パラメータ (GDA) はミニマックス最適化問題の解法として広く応用されている。本稿では、KL-L型幾何学の収束を研究することにより、そのようなギャップを埋める。
論文参考訳（メタデータ） (2021-02-09T05:35:53Z)
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling [110.88857917726276]
我々は,非log-concaveとなる分布のクラスからサンプリングするために,勾配ランゲヴィンダイナミクス(SGLD)の新たな収束解析を行う。我々のアプローチの核心は、補助的時間反転型マルコフ連鎖を用いたSGLDのコンダクタンス解析である。
論文参考訳（メタデータ） (2020-10-19T15:23:18Z)
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime [50.510421854168065]
平均勾配勾配勾配は極小収束率が得られることを示す。本稿では、ReLUネットワークのNTKで指定されたターゲット関数を最適収束速度で学習できることを示す。
論文参考訳（メタデータ） (2020-06-22T14:31:37Z)
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems [75.58134963501094]
本稿では,勾配降下(SGD)の軌跡を解析する。我々はSGDが厳格なステップサイズポリシーのために1ドルでサドルポイント/マニフォールドを避けることを示す。
論文参考訳（メタデータ） (2020-06-19T14:11:26Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
双線形問題に対処するために、CoGDアルゴリズム(Cogradient Descent Algorithm)を導入する。一方の変数は、他方の変数との結合関係を考慮し、同期勾配降下をもたらす。本アルゴリズムは,空間的制約下での1変数の問題を解くために応用される。
論文参考訳（メタデータ） (2020-06-16T13:41:54Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。