Fugu-MT 論文翻訳(概要): A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization

論文の概要: A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization

arxiv url: http://arxiv.org/abs/2404.12312v3
Date: Thu, 24 Oct 2024 13:38:19 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-28 17:07:31.934522
Title: A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization
Title（参考訳）: 機能的ミニマックス最適化のためのニューラル確率勾配勾配の平均場解析
Authors: Yuchen Zhu, Yufeng Zhang, Zhaoran Wang, Zhuoran Yang, Xiaohong Chen,
Abstract要約: 本稿では,超パラメトリック化された2層ニューラルネットワークの無限次元関数クラス上で定義される最小最適化問題について検討する。 i) 勾配降下指数アルゴリズムの収束と, (ii) ニューラルネットワークの表現学習に対処する。その結果、ニューラルネットワークによって誘導される特徴表現は、ワッサーシュタイン距離で測定された$O(alpha-1)$で初期表現から逸脱することが許された。
参考スコア（独自算出の注目度）: 90.87444114491116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparameterized two-layer neural networks. In particular, we consider the minimax optimization problem stemming from estimating linear functional equations defined by conditional expectations, where the objective functions are quadratic in the functional spaces. We address (i) the convergence of the stochastic gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. We establish convergence under the mean-field regime by considering the continuous-time and infinite-width limit of the optimization dynamics. Under this regime, the stochastic gradient descent-ascent corresponds to a Wasserstein gradient flow over the space of probability measures defined over the space of neural network parameters. We prove that the Wasserstein gradient flow converges globally to a stationary point of the minimax objective at a $O(T^{-1} + \alpha^{-1})$ sublinear rate, and additionally finds the solution to the functional equation when the regularizer of the minimax objective is strongly convex. Here $T$ denotes the time and $\alpha$ is a scaling parameter of the neural networks. In terms of representation learning, our results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(\alpha^{-1})$, measured in terms of the Wasserstein distance. Finally, we apply our general results to concrete examples including policy evaluation, nonparametric instrumental variable regression, asset pricing, and adversarial Riesz representer estimation.
Abstract（参考訳）: 本稿では、過パラメータ化された2層ニューラルネットワークの無限次元関数クラス上で定義される最小最適化問題について検討する。特に、目的関数が函数空間において二次的である条件付き期待によって定義される線形汎関数方程式を推定することから生じるミニマックス最適化問題を考察する。特集にあたって (i)確率勾配降下指数アルゴリズムの収束とその応用 (II)ニューラルネットワークの表現学習最適化力学の連続時間および無限幅極限を考慮し、平均場状態下で収束を確立する。この状態下では、確率勾配勾配は、ニューラルネットワークパラメータの空間上で定義された確率測度の空間上のワッサーシュタイン勾配の流れに対応する。ワッサーシュタイン勾配流は、$O(T^{-1} + \alpha^{-1})$ sublinear rateでミニマックス対象の定常点に大域的に収束し、さらに、ミニマックス対象の正則化が強い凸であるときに函数方程式の解を求める。ここで$T$は時間を表し、$\alpha$はニューラルネットワークのスケーリングパラメータである。表現学習では,ニューラルネットワークによって誘導される特徴表現が,ワッサーシュタイン距離で測定された$O(\alpha^{-1})$で初期表現から逸脱することが認められた。最後に, 政策評価, 非パラメトリック機器変数回帰, 資産価格, 逆Riesz代表者推定などの具体例に適用する。

関連論文リスト

Gradient-free stochastic optimization for additive models [56.42455605591779]
本稿では,Polyak-Lojasiewicz あるいは強凸条件を満たす目的関数に対する雑音観測によるゼロ次最適化の問題に対処する。対象関数は加法的構造を持ち、H"古い関数族によって特徴づけられる高次滑らか性特性を満たすと仮定する。
論文参考訳（メタデータ） (2025-03-03T23:39:08Z)
Non-asymptotic convergence analysis of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with applications to training of ReLU neural networks [8.058385158111207]
我々は、勾配ハミルトニアンモンテカルロのWasserstein-1 と Wasserstein-2 距離の目標測度への収束の非漸近解析を提供する。本研究の主な成果を説明するために、定量推定に関する数値実験と、金融と人工知能に関連するReLUニューラルネットワークに関わるいくつかの問題について考察する。
論文参考訳（メタデータ） (2024-09-25T17:21:09Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
近年の研究では、再生カーネルヒルベルト空間(RKHS)がニューラルネットワークによる関数のモデル化に適した空間ではないことが示されている。本稿では,有界ノルムを持つオーバーパラメータ化された2層ニューラルネットワークに適した関数空間について検討する。
論文参考訳（メタデータ） (2024-04-29T15:04:07Z)
Approximation Results for Gradient Descent trained Neural Networks [0.0]
ネットワークは完全に接続された一定の深さ増加幅である。連続カーネルエラーノルムは、滑らかな関数に必要な自然な滑らかさの仮定の下での近似を意味する。
論文参考訳（メタデータ） (2023-09-09T18:47:55Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
本研究では,観測データに基づいて線形関数を推定するための最適手順について検討する。任意の凸および対称函数クラス $mathcalF$ に対して、平均二乗誤差で有界な非漸近局所ミニマックスを導出する。
論文参考訳（メタデータ） (2023-01-16T02:57:37Z)
Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
本研究では,ReLUを活性化した2層完全連結ニューラルネットワークにおける勾配流と勾配降下の暗黙的バイアスについて検討する。勾配流には、均一なニューラルネットワークに対する暗黙のバイアスに関する最近の研究を活用し、リーク的に勾配流が2つ以上のランクを持つニューラルネットワークを生成することを示す。勾配降下は, ランダムな分散が十分小さい場合, 勾配降下の1ステップでネットワークのランクが劇的に低下し, トレーニング中もランクが小さくなることを示す。
論文参考訳（メタデータ） (2022-10-13T15:09:54Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
オンライン相関解析の問題から,emphStochastic Scaled-Gradient Descent (SSD)アルゴリズムを提案する。我々はこれらのアイデアをオンライン相関解析に適用し、局所収束率を正規性に比例した最適な1時間スケールのアルゴリズムを初めて導いた。
論文参考訳（メタデータ） (2021-12-29T18:46:52Z)
Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function [3.5044892799305956]
Lovas et alで導入された未調整Langevinアルゴリズム(TUSLA)の非漸近解析を行う。特に、Wassersteinstein-1-2におけるTUSLAアルゴリズムの非漸近誤差境界を確立する。 TUSLAアルゴリズムは最適解に急速に収束することを示す。
論文参考訳（メタデータ） (2021-07-19T07:13:02Z)
q-RBFNN:A Quantum Calculus-based RBF Neural Network [31.14412266444568]
放射状基底関数ニューラルネットワーク(RBFNN)に対する勾配降下に基づく学習手法を提案する。提案手法は、ジャクソン微分(Jackson derivative)とも呼ばれるq勾配に基づく。提案した$q$-RBFNNは最小二乗アルゴリズムの文脈における収束性能について解析する。
論文参考訳（メタデータ） (2021-06-02T08:27:12Z)
Overparameterization of deep ResNet: zero loss and mean-field analysis [19.45069138853531]
データに適合するディープニューラルネットワーク(NN)内のパラメータを見つけることは、非最適化問題である。基礎的な一階述語最適化法(漸進降下法)は,多くの現実的状況に完全に適合した大域的解を求める。所定の閾値未満の損失を減らすために必要な深さと幅を高い確率で推定する。
論文参考訳（メタデータ） (2021-05-30T02:46:09Z)
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime [50.510421854168065]
平均勾配勾配勾配は極小収束率が得られることを示す。本稿では、ReLUネットワークのNTKで指定されたターゲット関数を最適収束速度で学習できることを示す。
論文参考訳（メタデータ） (2020-06-22T14:31:37Z)
Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
非滑らかで非滑らかな関数の定常点を見つけるための最初の非漸近解析を提供する。特に、アダマール半微分可能函数(おそらく非滑らか関数の最大のクラス)について研究する。
論文参考訳（メタデータ） (2020-02-10T23:23:04Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。