Fugu-MT 論文翻訳(概要): Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

論文の概要: Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

arxiv url: http://arxiv.org/abs/2311.14222v1
Date: Thu, 23 Nov 2023 23:02:10 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-27 16:15:20.462045
Title: Risk Bounds of Accelerated SGD for Overparameterized Linear Regression
Title（参考訳）: 過パラメータ線形回帰に対する加速SGDのリスク境界
Authors: Xuheng Li and Yihe Deng and Jingfeng Wu and Dongruo Zhou and Quanquan Gu
Abstract要約: 加速度勾配降下(ASGD)は、深層学習におけるワークホースである。既存の最適化理論は、ASGDのより高速な収束を説明することしかできないが、より優れた一般化を説明することはできない。
参考スコア（独自算出の注目度）: 75.27846230182885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accelerated stochastic gradient descent (ASGD) is a workhorse in deep learning and often achieves better generalization performance than SGD. However, existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization. In this paper, we study the generalization of ASGD for overparameterized linear regression, which is possibly the simplest setting of learning with overparameterization. We establish an instance-dependent excess risk bound for ASGD within each eigen-subspace of the data covariance matrix. Our analysis shows that (i) ASGD outperforms SGD in the subspace of small eigenvalues, exhibiting a faster rate of exponential decay for bias error, while in the subspace of large eigenvalues, its bias error decays slower than SGD; and (ii) the variance error of ASGD is always larger than that of SGD. Our result suggests that ASGD can outperform SGD when the difference between the initialization and the true weight vector is mostly confined to the subspace of small eigenvalues. Additionally, when our analysis is specialized to linear regression in the strongly convex setting, it yields a tighter bound for bias error than the best-known result.
Abstract（参考訳）: 加速度確率勾配降下(ASGD)は深層学習におけるワークホースであり、しばしばSGDよりも優れた一般化性能を達成する。しかし、既存の最適化理論はASGDのより高速な収束しか説明できないが、より優れた一般化は説明できない。本稿では,過パラメータ化による学習の最も簡単な設定である過パラメータ化線形回帰に対するasgdの一般化について検討する。データ共分散行列の各固有部分空間内で、ASGDのインスタンス依存過剰リスクを定めている。私たちの分析は (i)ASGDは小さな固有値の部分空間においてSGDより優れ、バイアス誤差の指数的減衰の速度が速い一方、大きな固有値の部分空間では、そのバイアス誤差はSGDよりも遅い。 (ii) ASGD の分散誤差は SGD の分散誤差よりも常に大きい。その結果,初期化と真の重みベクトルの差が小さい固有値の部分空間に限られている場合,ASGDはSGDより優れていることが示唆された。さらに,本解析が強凸集合における線形回帰に特化すると,最もよく知られた結果よりもバイアス誤差に強く結びつく。

論文の概要: Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

関連論文リスト