Fugu-MT 論文翻訳(概要): Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron

論文の概要: Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron

arxiv url: http://arxiv.org/abs/2302.10034v2
Date: Tue, 10 Oct 2023 05:55:10 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-13 14:49:47.872412
Title: Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron
Title（参考訳）: 過度パラメータ化は1つのニューロンを学習するグラディエントDescentを指数的に遅くする
Authors: Weihang Xu, Simon S. Du
Abstract要約: ランダム勾配降下のグローバル収束を$Oleft(T-3right)$ rateで証明する。これら2つの境界は、収束率の正確な特徴づけを与える。このポテンシャル関数は緩やかに収束し、損失関数の緩やかな収束率を示す。
参考スコア（独自算出の注目度）: 49.45105570960104
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We revisit the problem of learning a single neuron with ReLU activation under Gaussian input with square loss. We particularly focus on the over-parameterization setting where the student network has $n\ge 2$ neurons. We prove the global convergence of randomly initialized gradient descent with a $O\left(T^{-3}\right)$ rate. This is the first global convergence result for this problem beyond the exact-parameterization setting ($n=1$) in which the gradient descent enjoys an $\exp(-\Omega(T))$ rate. Perhaps surprisingly, we further present an $\Omega\left(T^{-3}\right)$ lower bound for randomly initialized gradient flow in the over-parameterization setting. These two bounds jointly give an exact characterization of the convergence rate and imply, for the first time, that over-parameterization can exponentially slow down the convergence rate. To prove the global convergence, we need to tackle the interactions among student neurons in the gradient descent dynamics, which are not present in the exact-parameterization case. We use a three-phase structure to analyze GD's dynamics. Along the way, we prove gradient descent automatically balances student neurons, and use this property to deal with the non-smoothness of the objective function. To prove the convergence rate lower bound, we construct a novel potential function that characterizes the pairwise distances between the student neurons (which cannot be done in the exact-parameterization case). We show this potential function converges slowly, which implies the slow convergence rate of the loss function.
Abstract（参考訳）: 正方形損失を持つガウス入力下でのrelu活性化による単一ニューロン学習の課題を再考する。特に,学生ネットワークが$n\ge 2$ニューロンを持つ過パラメータ設定に注目する。ランダム初期化勾配勾配のグローバル収束を$O\left(T^{-3}\right)$ rateで証明する。これは、勾配降下が$\exp(-\Omega(T))$レートを楽しむ正確なパラメータ化設定(n=1$)を超えるこの問題に対する最初のグローバル収束結果である。おそらく意外なことに、オーバーパラメータ設定においてランダムに初期化された勾配流に対して、$\Omega\left(T^{-3}\right)$ lowerboundを示す。これら2つの境界は、収束率の正確な特徴を与え、初めて過度パラメータ化が収束率を指数関数的に遅くすることができることを暗示する。大域収束を証明するためには、正確なパラメータ化の場合に存在しない勾配降下ダイナミクスにおいて、学生ニューロン間の相互作用に取り組む必要がある。 gdの動力学解析には三相構造を用いる。その過程で、勾配降下が自動的に学生ニューロンのバランスをとることを証明し、この特性を用いて目的関数の非滑らか性に対処する。収束率の低い境界を証明するために、学生ニューロン間の対距離を特徴付ける新しいポテンシャル関数を構築する(正確なパラメータ化の場合では実現できない)。このポテンシャル関数はゆっくりと収束し、損失関数の緩やかな収束率を示す。

関連論文リスト

Convergence of Shallow ReLU Networks on Weakly Interacting Data [5.618969269882913]
我々は,n$のデータ点上の勾配流によって訓練された一層ReLUネットワークの収束解析を行う。次数$log(n)$ニューロンの幅を持つネットワークは、高い確率で大域収束するのに十分であることを示す。
論文参考訳（メタデータ） (2025-02-24T09:07:14Z)
Convergence Analysis of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks [3.680127959836384]
勾配勾配勾配(GD)や二次勾配勾配(SGD)のような一階法はニューラルネットワークのトレーニングに有効であることが証明されている。しかし, 2層ニューラルネットワークの学習におけるGDの学習速度は, サンプルサイズとグラム行列に依存しない。本稿では, 回帰問題である$L2$に対して, 学習率を$mathcalO(1)$から$mathcalO(1)$に改善できることを示す。
論文参考訳（メタデータ） (2024-08-01T14:06:34Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
近年の研究では、再生カーネルヒルベルト空間(RKHS)がニューラルネットワークによる関数のモデル化に適した空間ではないことが示されている。本稿では,有界ノルムを持つオーバーパラメータ化された2層ニューラルネットワークに適した関数空間について検討する。
論文参考訳（メタデータ） (2024-04-29T15:04:07Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
本稿では,超パラメトリック化された2層ニューラルネットワークの無限次元関数クラス上で定義される最小最適化問題について検討する。 i) 勾配降下指数アルゴリズムの収束と, (ii) ニューラルネットワークの表現学習に対処する。その結果、ニューラルネットワークによって誘導される特徴表現は、ワッサーシュタイン距離で測定された$O(alpha-1)$で初期表現から逸脱することが許された。
論文参考訳（メタデータ） (2024-04-18T16:46:08Z)
Analysis of the expected $L_2$ error of an over-parametrized deep neural network estimate learned by gradient descent without regularization [7.977229957867868]
近年の研究では、正規化された経験的リスクに勾配降下を適用して学習した過度パラメータ化されたディープニューラルネットワークによって定義される推定値が、普遍的に一貫していることが示されている。本稿では、同様の結果を得るために正規化項は必要ないことを示す。
論文参考訳（メタデータ） (2023-11-24T17:04:21Z)
Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis [121.9821494461427]
2層ReLUネットワークに必要なニューロン数を著しく削減する方法を示す。また、事前の作業を改善するための新しい下位境界を証明し、ある仮定の下では、最善を尽くすことができることを証明します。
論文参考訳（メタデータ） (2022-06-26T06:51:31Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
我々はAdam Adagradと$O(d(N)/st)$アルゴリズムの収束の証明を示す。 Adamはデフォルトパラメータで使用する場合と同じ収束$O(d(N)/st)$で収束する。
論文参考訳（メタデータ） (2020-03-05T01:56:17Z)
Learning Unitaries by Gradient Descent [12.354076490479516]
我々は、交互列の時間パラメータの勾配勾配から$Ud)$でユニタリ変換を学習する。勾配$パラメータが少ない場合、勾配降下は準最適解に収束するが、$d2$パラメータ以上の場合、勾配降下は最適解に収束する。
論文参考訳（メタデータ） (2020-01-31T15:20:55Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。