Fugu-MT 論文翻訳(概要): Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems

論文の概要: Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems

arxiv url: http://arxiv.org/abs/2403.08585v3
Date: Sun, 26 May 2024 06:17:59 GMT
ステータス: 翻訳完了
システム内更新日: 2024-05-29 06:36:16.088356
Title: Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems
Title（参考訳）: 最小二乗問題に対するプレコンディショニングによるSGDの帰納規則化の改善
Authors: Junwei Su, Difan Zou, Chuan Wu,
Abstract要約: 最小二乗問題に対する事前条件付き勾配降下(SGD)の一般化性能について検討した。提案したプレコンディショニング行列は有限標本からのロバストな推定が可能なほど単純であることを示す。
参考スコア（独自算出の注目度）: 19.995877680083105
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice and plays an important role in the generalization of modern machine learning. However, prior research has revealed instances where the generalization performance of SGD is worse than ridge regression due to uneven optimization along different dimensions. Preconditioning offers a natural solution to this issue by rebalancing optimization across different directions. Yet, the extent to which preconditioning can enhance the generalization performance of SGD and whether it can bridge the existing gap with ridge regression remains uncertain. In this paper, we study the generalization performance of SGD with preconditioning for the least squared problem. We make a comprehensive comparison between preconditioned SGD and (standard \& preconditioned) ridge regression. Our study makes several key contributions toward understanding and improving SGD with preconditioning. First, we establish excess risk bounds (generalization performance) for preconditioned SGD and ridge regression under an arbitrary preconditions matrix. Second, leveraging the excessive risk characterization of preconditioned SGD and ridge regression, we show that (through construction) there exists a simple preconditioned matrix that can make SGD comparable to (standard \& preconditioned) ridge regression. Finally, we show that our proposed preconditioning matrix is straightforward enough to allow robust estimation from finite samples while maintaining a theoretical improvement. Our empirical results align with our theoretical findings, collectively showcasing the enhanced regularization effect of preconditioned SGD.
Abstract（参考訳）: 確率勾配降下 (SGD) はアルゴリズムの正則化効果が強く、現代の機械学習の一般化において重要な役割を果たしている。しかし、従来の研究では、SGDの一般化性能が、異なる次元に沿った不均一な最適化のため、リッジ回帰よりも悪いことが判明している。プレコンディショニングは、最適化を異なる方向に再バランスすることで、この問題に自然な解決策を提供する。しかし, プレコンディショニングによってSGDの一般化性能が向上し, 既存の溝をリッジレグレッションで橋渡しできるかどうかは不明である。本稿では,最小二乗問題に対する事前条件付きSGDの一般化性能について検討する。プレコンディション付きSGDと(標準 \&プレコンディション付き)リッジレグレッションの総合的な比較を行う。本研究は,プレコンディショニングによるSGDの理解と改善にいくつかの重要な貢献をしている。まず、任意の事前条件行列の下で事前条件付きSGDとリッジ回帰に対する過剰リスク境界(一般化性能)を確立する。第二に、プレコンディショニングされたSGDとリッジ回帰の過度なリスク特性を利用して、(構成を通して)SGDを(標準的な \&プレコンディショニングされた)リッジ回帰に匹敵する単純なプレコンディショニング行列が存在することを示す。最後に,提案したプレコンディショニング行列は,理論的改善を維持しつつ,有限標本からのロバストな推定を可能にするほど単純であることを示す。予備条件付きSGDの高次正則化効果を総合的に示し, 実験結果と理論的知見が一致した。

論文の概要: Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems

関連論文リスト