論文の概要: On the existence of global minima and convergence analyses for gradient
descent methods in the training of deep neural networks
- arxiv url: http://arxiv.org/abs/2112.09684v1
- Date: Fri, 17 Dec 2021 18:55:40 GMT
- ステータス: 処理完了
- システム内更新日: 2021-12-20 15:39:54.437522
- Title: On the existence of global minima and convergence analyses for gradient
descent methods in the training of deep neural networks
- Title(参考訳): ディープニューラルネットワークのトレーニングにおける勾配降下法における大域最小化と収束解析の存在について
- Authors: Arnulf Jentzen, Adrian Riekert
- Abstract要約: フィードフォワード深層ReLU ANNを任意に多数の隠蔽層で研究する。
- 参考スコア(独自算出の注目度): 3.198144010381572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this article we study fully-connected feedforward deep ReLU ANNs with an
arbitrarily large number of hidden layers and we prove convergence of the risk
of the GD optimization method with random initializations in the training of
such ANNs under the assumption that the unnormalized probability density
function of the probability distribution of the input data of the considered
supervised learning problem is piecewise polynomial, under the assumption that
the target function (describing the relationship between input data and the
output data) is piecewise polynomial, and under the assumption that the risk
function of the considered supervised learning problem admits at least one
regular global minimum. In addition, in the special situation of shallow ANNs
with just one hidden layer and one-dimensional input we also verify this
assumption by proving in the training of such shallow ANNs that for every
Lipschitz continuous target function there exists a global minimum in the risk
landscape. Finally, in the training of deep ANNs with ReLU activation we also
study solutions of gradient flow (GF) differential equations and we prove that
every non-divergent GF trajectory converges with a polynomial rate of
convergence to a critical point (in the sense of limiting Fr\'echet
subdifferentiability). Our mathematical convergence analysis builds up on tools
from real algebraic geometry such as the concept of semi-algebraic functions
and generalized Kurdyka-Lojasiewicz inequalities, on tools from functional
analysis such as the Arzel\`a-Ascoli theorem, on tools from nonsmooth analysis
such as the concept of limiting Fr\'echet subgradients, as well as on the fact
that the set of realization functions of shallow ReLU ANNs with fixed
architecture forms a closed subset of the set of continuous functions revealed
by Petersen et al.
- Abstract(参考訳): In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum.
最後に、ReLU活性化を伴う深部ANNの訓練において、勾配流(GF)微分方程式の解も研究し、すべての非発散GF軌道が臨界点への収束の多項式速度(Fr\echet subdifferentiability)に収束することを証明した。
Our mathematical convergence analysis builds up on tools from real algebraic geometry such as the concept of semi-algebraic functions and generalized Kurdyka-Lojasiewicz inequalities, on tools from functional analysis such as the Arzel\`a-Ascoli theorem, on tools from nonsmooth analysis such as the concept of limiting Fr\'echet subgradients, as well as on the fact that the set of realization functions of shallow ReLU ANNs with fixed architecture forms a closed subset of the set of continuous functions revealed by Petersen et al.
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
i) 勾配降下指数アルゴリズムの収束と, (ii) ニューラルネットワークの表現学習に対処する。
論文 参考訳(メタデータ) (2024-04-18T16:46:08Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
論文 参考訳(メタデータ) (2023-05-30T19:37:44Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
論文 参考訳(メタデータ) (2022-04-22T15:56:43Z) - Improved Overparametrization Bounds for Global Convergence of Stochastic
Gradient Descent for Shallow Neural Networks [1.14219428942199]
論文 参考訳(メタデータ) (2022-01-28T11:30:06Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
論文 参考訳(メタデータ) (2021-11-03T15:14:20Z) - Existence, uniqueness, and convergence rates for gradient flows in the
training of artificial neural networks with ReLU activation [2.4087148947930634]
論文 参考訳(メタデータ) (2021-08-18T12:06:19Z) - A proof of convergence for the gradient descent optimization method with
random initializations in the training of neural networks with ReLU
activation for piecewise linear target functions [3.198144010381572]
論文 参考訳(メタデータ) (2021-08-10T12:01:37Z) - Convergence analysis for gradient flows in the training of artificial
neural networks with ReLU activation [3.198144010381572]
論文 参考訳(メタデータ) (2021-07-09T15:08:30Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
論文 参考訳(メタデータ) (2020-07-11T18:19:50Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
論文 参考訳(メタデータ) (2020-06-22T14:31:37Z)