Fugu-MT 論文翻訳(概要): On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares

論文の概要: On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares

arxiv url: http://arxiv.org/abs/2007.13288v2
Date: Tue, 1 Sep 2020 20:34:27 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-06 08:29:47.932679
Title: On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares
Title（参考訳）: 確率勾配の正則化効果が最小方形への適用性について
Authors: Stefan Steinerberger
Abstract要約: mathbbRn times n$ の可逆 $A に対して $|Ax -b |2 rightarrow min$ に適用される勾配降下の挙動について検討する。ここでは、$A$ に明示的な定数 $c_A$ が存在して、$$ mathbbE left| Ax_k+1-bright|2_2 leq となることを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k - u$ than the remaining terms: if $x_k - x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.
Abstract（参考訳）: 可逆$A \in \mathbb{R}^{n \times n}$に対して$\|Ax -b \|_2^2 \rightarrow \min$に適用される確率勾配降下の挙動について検討する。 a$ 上の明示的な定数 $c_{a}$ が存在して、$$$ \mathbb{e} ~\left\| ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{a}}{\|a\|_f^2}\right) \left\|a x_k -b \right\|^2_{2}\frac{2}{\|a\|_f^2} \left\|a^t a (x_k - x)\right\|^2_{2} となる。最後の項は、残余の$u_k - u$よりも1つの行列を持つ:$x_k - x$ が主に大きな特異ベクトルからなる場合、確率勾配勾配は素早く正規化する。対称行列に対して、この不等式は高階ソボレフ空間の拡張を持つ。これは(既知の)正規化現象を説明する:大きな特異値から小さな特異値へのエネルギーカスケード。

関連論文リスト

MLPs at the EOC: Concentration of the NTK [7.826806223782053]
ニューラルタンジェント(NTK)のK_theta濃度について検討した。我々は、勾配独立性の近似バージョンが有限幅で成り立つことを証明した。この限界を正確に近似するためには, 十分な濃度に対して, bbN+1$の約$mに対して, $m_k = k2 m$として, 隠蔽層幅を2次的に成長させる必要がある。
論文参考訳（メタデータ） (2025-01-24T18:58:50Z)
MLPs at the EOC: Spectrum of the NTK [7.826806223782053]
ニューラルスタイル(NTK)$oversetscriptstyleinftyKの特性について検討する。 $Delta_phi = fracb2a2+b2$ は、NTK行列の条件数がその極限に収束する速度を決定する。
論文参考訳（メタデータ） (2025-01-22T21:12:51Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
この問題は通信複雑性のランダム化を$Omega(frac1kcdot n2log|mathbbF|)$とする。アプリケーションとして、$k$パスを持つ任意のストリーミングアルゴリズムに対して、$Omega(frac1kcdot n2log|mathbbF|)$スペースローバウンドを得る。
論文参考訳（メタデータ） (2024-10-26T06:21:42Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
線形スケッチを用いた行列とベクトルノルムの残差誤差推定問題について検討する。これは、前作とほぼ同じスケッチサイズと精度で、経験的にかなり有利であることを示す。また、スパースリカバリ問題に対して$Omega(k2/pn1-2/p)$低いバウンダリを示し、これは$mathrmpoly(log n)$ factorまで厳密である。
論文参考訳（メタデータ） (2024-08-16T02:33:07Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
本稿では、RMSPropとその運動量拡張を考察し、$frac1Tsum_k=1Tの収束速度を確立する。我々の収束率は、次元$d$を除くすべての係数に関して下界と一致する。収束率は$frac1Tsum_k=1Tと類似していると考えられる。
論文参考訳（メタデータ） (2024-02-01T07:21:32Z)
Spectral Statistics of the Sample Covariance Matrix for High Dimensional Linear Gaussians [12.524855369455421]
高次元安定状態遷移行列の予言のための通常最小二乗法(OLS)の性能 OLS推定器は、遠相遷移を発生させ、遠相遷移となり、推定誤差を悪化させるだけである。
論文参考訳（メタデータ） (2023-12-10T06:55:37Z)
Convergence of Alternating Gradient Descent for Matrix Factorization [5.439020425819001]
非対称行列分解対象に一定のステップサイズを施した交互勾配降下(AGD)について検討した。階数-r$行列 $mathbfA in mathbbRm times n$, smoothness $C$ in the complexity $T$ to be a absolute constant。
論文参考訳（メタデータ） (2023-05-11T16:07:47Z)
Generalizations of Powers--Størmer's inequality [0.0]
Mathrmtr|A-B|leq 2, Mathrmtrbig(f(A)g(B)big) endalign* は任意の正値行列単調関数に対して$f$である。この不等式を満たす関数の集合には追加の要素が含まれており、この主張を支持するための図示的な例が示されている。
論文参考訳（メタデータ） (2023-02-15T17:59:01Z)
A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee [16.409210914237086]
行列 $Ain mathbbRntimes d$ とテンソル $bin mathbbRn$ が与えられたとき、 $ell_infty$ の回帰問題を考える。このような$ell_infty$レグレッションの保証を得るためには、濃密なスケッチ行列を使わなければならない。我々はまた、OCE(Oblivious Coordinate-wise Embedding)特性を利用した $ell_infty$ guarantee regression のための新しい分析フレームワークを開発した。
論文参考訳（メタデータ） (2023-02-01T05:22:40Z)
Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
我々は、任意のSchatten-$p$ノルムの下で、低ランク近似のためのクリロフ部分空間に基づく反復法について研究する。我々の主な成果は、$tildeO(k/sqrtepsilon)$ matrix-vector productのみを使用するアルゴリズムである。
論文参考訳（メタデータ） (2022-02-10T16:10:41Z)
On the Self-Penalization Phenomenon in Feature Selection [69.16452769334367]
カーネル群に基づく暗黙の空間性誘導機構について述べる。アプリケーションとしては、この疎結合誘導機構を使用して、特徴選択に一貫性のあるアルゴリズムを構築します。
論文参考訳（メタデータ） (2021-10-12T09:36:41Z)
Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns [50.053491972003656]
関数 $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$. ここでは、$|mathbb E[R(z)] - tilde R(z)|_F を示す。
論文参考訳（メタデータ） (2021-09-06T14:21:43Z)
A matrix concentration inequality for products [0.0]
十分小さな正の$alpha$, $Z_n$は、濃度不等式を満足する:CTbound mathbbPleft(leftVert Z_n-mathbbEleft[Z_nright]rightVert geq tright) leq 2d2cdotexpleft(frac-t2alpha sigma2 right) quad text for all。
論文参考訳（メタデータ） (2020-08-12T04:39:12Z)
On the robustness of the minimum $\ell_2$ interpolator [2.918940961856197]
一般高次元線形回帰フレームワークにおいて最小$ell$-norm$hatbeta$で補間を解析する。高い確率で、この推定器の予測損失は、上から$(|beta*|2r_cn(Sigma)vee |xi|2)/n$で有界であることを証明する。
論文参考訳（メタデータ） (2020-03-12T15:12:28Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。