Fugu-MT 論文翻訳(概要): Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks

論文の概要: Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks

arxiv url: http://arxiv.org/abs/2410.09640v1
Date: Mon, 21 Oct 2024 08:33:44 GMT
ステータス: 翻訳完了
システム内更新日: 2024-10-30 09:06:07.710457
Title: Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Title（参考訳）: 直交行列分解と線形ニューラルネットワークのためのネステロフ加速勾配の確率的加速
Authors: Zhenghao Xu, Yuqing Wang, Tuo Zhao, Rachel Ward, Molei Tao,
Abstract要約: 我々はネステロフの加速勾配が複雑性$O(kappalogfrac1epsilon)$に達することを証明している。特に,NAGが線形収束速度を加速できることを示す。
参考スコア（独自算出の注目度）: 46.04785603483612
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the convergence rate of first-order methods for rectangular matrix factorization, which is a canonical nonconvex optimization problem. Specifically, given a rank-$r$ matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$, we prove that gradient descent (GD) can find a pair of $\epsilon$-optimal solutions $\mathbf{X}_T\in\mathbb{R}^{m\times d}$ and $\mathbf{Y}_T\in\mathbb{R}^{n\times d}$, where $d\geq r$, satisfying $\lVert\mathbf{X}_T\mathbf{Y}_T^\top-\mathbf{A}\rVert_\mathrm{F}\leq\epsilon\lVert\mathbf{A}\rVert_\mathrm{F}$ in $T=O(\kappa^2\log\frac{1}{\epsilon})$ iterations with high probability, where $\kappa$ denotes the condition number of $\mathbf{A}$. Furthermore, we prove that Nesterov's accelerated gradient (NAG) attains an iteration complexity of $O(\kappa\log\frac{1}{\epsilon})$, which is the best-known bound of first-order methods for rectangular matrix factorization. Different from small balanced random initialization in the existing literature, we adopt an unbalanced initialization, where $\mathbf{X}_0$ is large and $\mathbf{Y}_0$ is $0$. Moreover, our initialization and analysis can be further extended to linear neural networks, where we prove that NAG can also attain an accelerated linear convergence rate. In particular, we only require the width of the network to be greater than or equal to the rank of the output label matrix. In contrast, previous results achieving the same rate require excessive widths that additionally depend on the condition number and the rank of the input data matrix.
Abstract（参考訳）: 正準非凸最適化問題である長方行列分解の一階法の収束率について検討する。具体的には、階数-$r$行列 $\mathbf{A}\in\mathbb{R}^{m\times n}$ が与えられたとき、勾配降下 (GD) が $\epsilon$-optimal solution $\mathbf{X}_T\in\mathbb{R}^{m\times d}$ と $\mathbf{Y}_T\in\mathbb{R}^{n\times d}$ と $d\geq r$ が $\lVert\mathbf{X}_T\mathbf{Y}_T^\top-\mathbf{A}\rVert_\mathrm{F}\leq\epsilon\lVert\mathbf{A}\rVert_\mathrm{F}\leq\leq\epsilon$R}^{m\times d}$ であることを示す。さらに、ネステロフの加速勾配 (NAG) が、長方行列分解のための一階法の最もよく知られた境界である$O(\kappa\log\frac{1}{\epsilon})$の反復複雑性に達することを証明している。既存の文献では、小さなバランスの取れたランダムな初期化とは異なり、$\mathbf{X}_0$ が大きければ$\mathbf{Y}_0$ が$0$ となるアンバランスな初期化を採用する。さらに、我々の初期化と解析は線形ニューラルネットワークにさらに拡張することができ、NAGが加速された線形収束率に達することも証明できる。特に、ネットワークの幅が出力ラベル行列のランクより大きいか等しいかだけを要求します。対照的に、同じレートを達成する前の結果は、入力データ行列の条件数とランクに依存する余分な幅を必要とする。

論文の概要: Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks

関連論文リスト