Fugu-MT 論文翻訳(概要): How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

論文の概要: How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

arxiv url: http://arxiv.org/abs/2310.01769v3
Date: Fri, 24 Nov 2023 18:08:25 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-28 03:19:13.398838
Title: How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization
Title（参考訳）: マトリックスセンシングにおける過度パラメータ化の緩やかさ:対称性と初期化の曲線
Authors: Nuoya Xiong, Lijun Ding, Simon S. Du
Abstract要約: 過パラメータ化が降下の収束挙動をどのように変化させるかを示す。目的は、ほぼ等方的線形測定から未知の低ランクの地上構造行列を復元することである。本稿では,GDの一段階だけを修飾し,$alpha$に依存しない収束率を求める手法を提案する。
参考スコア（独自算出の注目度）: 46.55524654398093
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper rigorously shows how over-parameterization changes the convergence behaviors of gradient descent (GD) for the matrix sensing problem, where the goal is to recover an unknown low-rank ground-truth matrix from near-isotropic linear measurements. First, we consider the symmetric setting with the symmetric parameterization where $M^* \in \mathbb{R}^{n \times n}$ is a positive semi-definite unknown matrix of rank $r \ll n$, and one uses a symmetric parameterization $XX^\top$ to learn $M^*$. Here $X \in \mathbb{R}^{n \times k}$ with $k > r$ is the factor matrix. We give a novel $\Omega (1/T^2)$ lower bound of randomly initialized GD for the over-parameterized case ($k >r$) where $T$ is the number of iterations. This is in stark contrast to the exact-parameterization scenario ($k=r$) where the convergence rate is $\exp (-\Omega (T))$. Next, we study asymmetric setting where $M^* \in \mathbb{R}^{n_1 \times n_2}$ is the unknown matrix of rank $r \ll \min\{n_1,n_2\}$, and one uses an asymmetric parameterization $FG^\top$ to learn $M^*$ where $F \in \mathbb{R}^{n_1 \times k}$ and $G \in \mathbb{R}^{n_2 \times k}$. Building on prior work, we give a global exact convergence result of randomly initialized GD for the exact-parameterization case ($k=r$) with an $\exp (-\Omega(T))$ rate. Furthermore, we give the first global exact convergence result for the over-parameterization case ($k>r$) with an $\exp(-\Omega(\alpha^2 T))$ rate where $\alpha$ is the initialization scale. This linear convergence result in the over-parameterization case is especially significant because one can apply the asymmetric parameterization to the symmetric setting to speed up from $\Omega (1/T^2)$ to linear convergence. On the other hand, we propose a novel method that only modifies one step of GD and obtains a convergence rate independent of $\alpha$, recovering the rate in the exact-parameterization case.
Abstract（参考訳）: 本稿では,非等方性線形測定から未知の低位接地面行列を回収することを目的とした行列センシング問題において,過パラメータ化が勾配降下(gd)の収束挙動をどのように変化させるかを示す。まず、対称パラメータ化を持つ対称集合を考える: $m^* \in \mathbb{r}^{n \times n}$ はランク $r \ll n$ の正の半定値未知行列であり、対称パラメータ化 $xx^\top$ を用いて $m^*$ を学ぶ。ここで、$X \in \mathbb{R}^{n \times k}$ with $k > r$ は因子行列である。オーバーパラメータ化されたケース(k >r$)に対して、新しい$\Omega (1/T^2)$ ランダムに初期化された GD の下限を与える。これは、収束率が$\exp (-\Omega (T))$である正確なパラメータ化シナリオ(k=r$)とは対照的である。次に、$m^* \in \mathbb{r}^{n_1 \times n_2}$ をランク $r \ll \min\{n_1,n_2\}$ の未知行列とし、非対称パラメータ化 $fg^\top$ を用いて $m^*$ を学習し、$f \in \mathbb{r}^{n_1 \times k}$ と $g \in \mathbb{r}^{n_2 \times k}$ を学習する非対称な設定について検討する。先行研究に基づいて、$\exp (-\Omega(T))$ rateの正確なパラメータ化の場合(k=r$)に対してランダムに初期化されたGDのグローバルな正確な収束結果を与える。さらに、オーバーパラメータ化の場合(k>r$)に対して、$\exp(-\Omega(\alpha^2T))$レートで最初の大域的正確な収束結果を与える。この線形収束は、非対称なパラメータ化を対称性の設定に適用し、$\Omega (1/T^2)$から線形収束に高速化することができるため、特に重要である。一方,gdの一段階のみを修正し,$\alpha$に依存しない収束率を求め,正確なパラメータ化の場合の収束率を回復する新しい手法を提案する。

論文の概要: How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

関連論文リスト