Fugu-MT 論文翻訳(概要): Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks

論文の概要: Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks

arxiv url: http://arxiv.org/abs/2307.07410v1
Date: Thu, 13 Jul 2023 13:27:51 GMT
ステータス: 翻訳完了
システム内更新日: 2023-07-17 13:35:33.391779
Title: Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks
Title（参考訳）: AIにおける命令正則化は最適化における近似の一般化された硬度を満たす -- 対角線ネットワークに対するシャープ結果
Authors: Johan S. Wind, Vegard Antun, Anders C. Hansen
Abstract要約: 直交線形ネットワークの勾配流による暗黙の正規化について, 鋭い結果を示す。これを近似の一般化硬度における相転移現象と関連付ける。結果の非シャープ性は、基礎追従最適化問題に対して、GHA現象が起こらないことを意味する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding the implicit regularization imposed by neural network architectures and gradient based optimization methods is a key challenge in deep learning and AI. In this work we provide sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks (DLNs) in the over-parameterized regression setting and, potentially surprisingly, link this to the phenomenon of phase transitions in generalized hardness of approximation (GHA). GHA generalizes the phenomenon of hardness of approximation from computer science to, among others, continuous and robust optimization. It is well-known that the $\ell^1$-norm of the gradient flow of DLNs with tiny initialization converges to the objective function of basis pursuit. We improve upon these results by showing that the gradient flow of DLNs with tiny initialization approximates minimizers of the basis pursuit optimization problem (as opposed to just the objective function), and we obtain new and sharp convergence bounds w.r.t.\ the initialization size. Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem -- which is a contradiction -- thus implying sharpness. Moreover, we characterize $\textit{which}$ $\ell_1$ minimizer of the basis pursuit problem is chosen by the gradient flow whenever the minimizer is not unique. Interestingly, this depends on the depth of the DLN.
Abstract（参考訳）: ニューラルネットワークアーキテクチャと勾配に基づく最適化手法によって課される暗黙の規則化を理解することは、ディープラーニングとAIの重要な課題である。本研究は, 直交線形ネットワーク(DLN)の過パラメータ回帰設定における勾配流による暗黙的正則化について, 急激な結果を与えるとともに, 近似の一般化硬度(GHA)における位相遷移現象と関連付ける。 GHAは、コンピュータ科学から連続的かつ堅牢な最適化まで、近似の硬さの現象を一般化する。小さな初期化を持つDLNの勾配流の$\ell^1$-normが基底探索の目的関数に収束することが知られている。これらの結果から,初期化が小さいdlnの勾配流は基底追従最適化問題(目的関数のみとは対照的に)の最小化を近似し,初期化サイズを新たに鋭い収束境界 w.r.t. を得る。我々の結果の非シャープ性は、基礎探索最適化問題(矛盾である)に対してGHA現象が起こらないことを示唆し、鋭さを示唆する。さらに、基本追従問題の最小値である$\textit{who}$$\ell_1$ minimumr を、最小値が一意でないときは常に勾配フローによって選択する。興味深いことに、これはDLNの深さに依存する。

関連論文リスト

Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
勾配変分オンライン学習は、オンライン関数の勾配の変化とともにスケールする後悔の保証を達成することを目的としている。ニューラルネットワーク最適化における最近の取り組みは、一般化された滑らかさ条件を示唆し、滑らかさは勾配ノルムと相関する。ゲームにおける高速収束と拡張逆最適化への応用について述べる。
論文参考訳（メタデータ） (2024-08-17T02:22:08Z)
Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
暗黙の勾配降下(IGD)は、ある種のマルチスケール問題を扱う場合、共通勾配降下(GD)よりも優れる。 IGDは線形収束速度で大域的に最適解を収束することを示す。
論文参考訳（メタデータ） (2024-07-03T06:10:41Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
本稿では,超パラメトリック化された2層ニューラルネットワークの無限次元関数クラス上で定義される最小最適化問題について検討する。 i) 勾配降下指数アルゴリズムの収束と, (ii) ニューラルネットワークの表現学習に対処する。その結果、ニューラルネットワークによって誘導される特徴表現は、ワッサーシュタイン距離で測定された$O(alpha-1)$で初期表現から逸脱することが許された。
論文参考訳（メタデータ） (2024-04-18T16:46:08Z)
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport [26.47265060394168]
深部ニューラルネットワークの勾配流は遠距離で任意に収束することを示す。これは空間における有限幅の勾配距離の理論に依存する。
論文参考訳（メタデータ） (2024-03-19T16:34:31Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
本稿では,ニューラルネットワークトレーニングを安定化(大規模)するための原理的手法として,線形アヘッドの理論解析を提案する。最適化過程の不安定性は、しばしば損失ランドスケープの非単調性によって引き起こされるものであり、非拡張作用素の理論を活用することによって線型性がいかに役立つかを示す。
論文参考訳（メタデータ） (2023-10-20T12:45:12Z)
Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
本研究では,ReLUを活性化した2層完全連結ニューラルネットワークにおける勾配流と勾配降下の暗黙的バイアスについて検討する。勾配流には、均一なニューラルネットワークに対する暗黙のバイアスに関する最近の研究を活用し、リーク的に勾配流が2つ以上のランクを持つニューラルネットワークを生成することを示す。勾配降下は, ランダムな分散が十分小さい場合, 勾配降下の1ステップでネットワークのランクが劇的に低下し, トレーニング中もランクが小さくなることを示す。
論文参考訳（メタデータ） (2022-10-13T15:09:54Z)
Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks [1.14219428942199]
本研究では,1つの隠れ層フィードフォワードニューラルネットワークのクラスに対して,勾配降下アルゴリズムのグローバル収束に必要な過パラメトリゼーション境界について検討する。
論文参考訳（メタデータ） (2022-01-28T11:30:06Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sample (AIS) と関連するアルゴリズムは、限界推定のための非常に効果的なツールである。差別性は、目的として限界確率を最適化する可能性を認めるため、望ましい性質である。我々はメトロポリス・ハスティングスのステップを放棄して微分可能アルゴリズムを提案し、ミニバッチ計算をさらに解き放つ。
論文参考訳（メタデータ） (2021-07-21T17:10:14Z)
On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks [1.0323063834827415]
勾配流下で訓練された単層線形ネットワークの新たな解析法を提案する。正方形損失はその最適値に指数関数的に収束することを示す。我々は、トレーニングされたネットワークとmin-norm解の間の距離に基づいて、新しい非漸近上界を導出する。
論文参考訳（メタデータ） (2021-05-13T15:13:51Z)
Improved Analysis of Clipping Algorithms for Non-convex Optimization [19.507750439784605]
最近、citetzhang 2019gradient show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD。実験は、深層学習におけるクリッピングに基づく手法の優位性を確認する。
論文参考訳（メタデータ） (2020-10-05T14:36:59Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
双線形問題に対処するために、CoGDアルゴリズム(Cogradient Descent Algorithm)を導入する。一方の変数は、他方の変数との結合関係を考慮し、同期勾配降下をもたらす。本アルゴリズムは,空間的制約下での1変数の問題を解くために応用される。
論文参考訳（メタデータ） (2020-06-16T13:41:54Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。