Fugu-MT 論文翻訳(概要): Convergence of Alternating Gradient Descent for Matrix Factorization

論文の概要: Convergence of Alternating Gradient Descent for Matrix Factorization

arxiv url: http://arxiv.org/abs/2305.06927v2
Date: Thu, 8 Feb 2024 00:53:12 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-09 19:49:31.224039
Title: Convergence of Alternating Gradient Descent for Matrix Factorization
Title（参考訳）: マトリックス因子化のための交互勾配の収束
Authors: Rachel Ward and Tamara G. Kolda
Abstract要約: 非対称行列分解対象に一定のステップサイズを施した交互勾配降下(AGD)について検討した。階数-r$行列 $mathbfA in mathbbRm times n$, smoothness $C$ in the complexity $T$ to be a absolute constant。
参考スコア（独自算出の注目度）: 5.439020425819001
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider alternating gradient descent (AGD) with fixed step size applied to the asymmetric matrix factorization objective. We show that, for a rank-$r$ matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, $T = C (\frac{\sigma_1(\mathbf{A})}{\sigma_r(\mathbf{A})})^2 \log(1/\epsilon)$ iterations of alternating gradient descent suffice to reach an $\epsilon$-optimal factorization $\| \mathbf{A} - \mathbf{X} \mathbf{Y}^{T} \|^2 \leq \epsilon \| \mathbf{A}\|^2$ with high probability starting from an atypical random initialization. The factors have rank $d \geq r$ so that $\mathbf{X}_{T}\in\mathbb{R}^{m \times d}$ and $\mathbf{Y}_{T} \in\mathbb{R}^{n \times d}$, and mild overparameterization suffices for the constant $C$ in the iteration complexity $T$ to be an absolute constant. Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly improves the convergence rate of gradient descent in practice. Our proof is conceptually simple: a uniform Polyak-\L{}ojasiewicz (PL) inequality and uniform Lipschitz smoothness constant are guaranteed for a sufficient number of iterations, starting from our random initialization. Our proof method should be useful for extending and simplifying convergence analyses for a broader class of nonconvex low-rank factorization problems.
Abstract（参考訳）: 非対称行列分解対象に一定のステップサイズで交互勾配降下(AGD)を考慮する。階数-$r$行列 $\mathbf{A} \in \mathbb{R}^{m \times n}$, $T = C (\frac{\sigma_1(\mathbf{A})}{\sigma_r(\mathbf{A})})^2 \log(1/\epsilon)$ suffice to reach a $\epsilon$-optimal factorization $\| \mathbf{A} - \mathbf{X} \mathbf{Y}^{T} \|^2 \leq \epsilon \| \mathbf{A}\|^2$ は、典型的な初期化から始まる確率の高い確率を持つ。これらの因子は、$d \geq r$ をランク付けして、$\mathbf{x}_{t}\in\mathbb{r}^{m \times d}$ と$\mathbf{y}_{t} \in\mathbb{r}^{n \times d}$ とし、イテレーションの複雑さにおいて定数 $c$ に対する軽度な過パラメータ化 suffices for the constant $c$ を絶対定数とする。実験により,提案する初期化は理論上の利点に留まらず,実際の勾配降下の収束率を大幅に向上させることが示唆された。均一なポリak-\l{}ojasiewicz (pl)不等式と一様リプシッツ滑らか性定数は、我々のランダム初期化から始まり、十分な数の反復に対して保証される。本手法は,非凸低ランク因子分解問題のより広いクラスに対する収束解析の拡張と単純化に有用である。

関連論文リスト

Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks [46.04785603483612]
我々はネステロフの加速勾配が複雑性$O(kappalogfrac1epsilon)$に達することを証明している。特に,NAGが線形収束速度を加速できることを示す。
論文参考訳（メタデータ） (2024-10-12T20:33:37Z)
In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting [21.002519159190538]
我々は分散アルゴリズムを解析し、$N$クライアント上で低ランク行列の分解を計算する。グローバルな$mathbfV$ in $mathbbRd times r$をすべてのクライアントに共通とし、ローカルな$mathbfUi$ in $mathbbRn_itimes r$を得る。
論文参考訳（メタデータ） (2024-09-13T12:28:42Z)
Efficient Continual Finite-Sum Minimization [52.5238287567572]
連続有限サム最小化(continuous finite-sum minimization)と呼ばれる有限サム最小化の鍵となるツイストを提案する。我々のアプローチは$mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$で大幅に改善されます。また、$mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$という自然な一階法は存在しないことを証明し、この方法の第一階法がほぼ密であることを示す。
論文参考訳（メタデータ） (2024-06-07T08:26:31Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
非線形測定では、ほとんどの先行結果は一様ではない、すなわち、すべての$mathbfx*$に対してではなく、固定された$mathbfx*$に対して高い確率で保持される。本フレームワークはGCSに1ビット/一様量子化観測と単一インデックスモデルを標準例として適用する。また、指標集合が計量エントロピーが低い製品プロセスに対して、より厳密な境界を生み出す濃度不等式も開発する。
論文参考訳（メタデータ） (2023-09-25T17:54:19Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
相補的な合成条件に基づく2つの難解なミラー降下アルゴリズムを導入する。注目すべきは、どちらのアルゴリズムも、目的関数のリプシッツ定数や滑らかさに関する事前の知識なしで機能する。本稿では,大規模半確定プログラム上での手法の効率性とロバスト性を示す。
論文参考訳（メタデータ） (2023-06-30T08:34:29Z)
Near Optimal Heteroscedastic Regression with Symbiotic Learning [29.16456701187538]
我々は不連続線形回帰の問題を考察する。正則ノルムにおいて$mathbfw*$を$tildeOleft(|mathbff*|2cdot left(frac1n + left(dnright)2right)$の誤差まで推定し、一致する下界を証明できる。
論文参考訳（メタデータ） (2023-06-25T16:32:00Z)
Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality [0.0]
我々は、$symbol$k$収束問題に対して、新しいマップベースのアルゴリズム(mathsfnorMtext-mathsfSGD$)を提案する。
論文参考訳（メタデータ） (2023-05-10T01:12:11Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization [49.090785356633695]
非対称な低ランク分解問題: [mathbbRm min d , mathbfU$ および MathV$ について検討する。
論文参考訳（メタデータ） (2021-06-27T17:25:24Z)
On the computational and statistical complexity of over-parameterized matrix sensing [30.785670369640872]
FGD法(Factorized Gradient Descend)を用いた低ランク行列検出の解法を検討する。分解行列 $mathbff$ を分離列空間に分解することにより、$|mathbff_t - mathbff_t - mathbfx*|_f2$ が統計誤差に収束することを示す。
論文参考訳（メタデータ） (2021-01-27T04:23:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。