Fugu-MT 論文翻訳(概要): Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations

論文の概要: Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations

arxiv url: http://arxiv.org/abs/2510.24466v1
Date: Tue, 28 Oct 2025 14:34:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:37.230886
Title: Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations
Title（参考訳）: 解析活性を有するニューラルネットワークの勾配Descent Mapの非特異性
Authors: Alexandru Crăciun, Debarghya Ghoshdastidar,
Abstract要約: 重みとバイアスの空間上の関数としてのニューラルネットワークマップについて検討する。我々は、現実的なニューラルネットワークアーキテクチャの損失ランドスケープにおける勾配降下(GD)マップの非特異性を初めて証明した。
参考スコア（独自算出の注目度）: 53.348574336527854
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The theory of training deep networks has become a central question of modern machine learning and has inspired many practical advancements. In particular, the gradient descent (GD) optimization algorithm has been extensively studied in recent years. A key assumption about GD has appeared in several recent works: the \emph{GD map is non-singular} -- it preserves sets of measure zero under preimages. Crucially, this assumption has been used to prove that GD avoids saddle points and maxima, and to establish the existence of a computable quantity that determines the convergence to global minima (both for GD and stochastic GD). However, the current literature either assumes the non-singularity of the GD map or imposes restrictive assumptions, such as Lipschitz smoothness of the loss (for example, Lipschitzness does not hold for deep ReLU networks with the cross-entropy loss) and restricts the analysis to GD with small step-sizes. In this paper, we investigate the neural network map as a function on the space of weights and biases. We also prove, for the first time, the non-singularity of the gradient descent (GD) map on the loss landscape of realistic neural network architectures (with fully connected, convolutional, or softmax attention layers) and piecewise analytic activations (which includes sigmoid, ReLU, leaky ReLU, etc.) for almost all step-sizes. Our work significantly extends the existing results on the convergence of GD and SGD by guaranteeing that they apply to practical neural network settings and has the potential to unlock further exploration of learning dynamics.
Abstract（参考訳）: ディープ・ネットワークを訓練する理論は現代の機械学習の中心的な問題となり、多くの実践的な進歩をもたらした。特に、勾配降下(GD)最適化アルゴリズムは近年広く研究されている。 GD に関する重要な仮定は、いくつかの最近の研究に現れている: \emph{GD map is non-singular} -- プレイメージの下で測度 0 の集合を保存する。重要なことに、この仮定はGDがサドル点や最大値を避け、GDと確率GDの両方の)大域最小値への収束を決定する計算可能な量の存在を確立するために用いられる。しかし、現在の文献はGD写像の非特異性を仮定するか、損失のリプシッツ滑らかさ(例えば、リプシッツ性はクロスエントロピー損失を持つ深いReLUネットワークを保たない)のような制限的な仮定を課し、分析を小さなステップサイズでGDに制限する。本稿では,重みと偏りの空間上の関数としてのニューラルネットワークマップについて検討する。また、現実的なニューラルネットワークアーキテクチャ(完全に連結された、畳み込みされた、またはソフトマックスの注意層)の損失ランドスケープ上の勾配降下(GD)マップの非特異性と、ほぼすべてのステップサイズに対して断片的解析的アクティベーション(シグモイド、ReLU、漏洩ReLUなどを含む)を初めて証明した。我々の研究は、GDとSGDの収束に関する既存の結果を、実践的なニューラルネットワーク設定に適用することを保証し、学習力学のさらなる探求を解き放つ可能性を秘めている。

論文の概要: Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations

関連論文リスト