論文の概要: Robust Implicit Regularization via Weight Normalization
- arxiv url: http://arxiv.org/abs/2305.05448v3
- Date: Fri, 23 Feb 2024 07:20:33 GMT
- ステータス: 処理完了
- システム内更新日: 2024-02-26 18:26:37.522076
- Title: Robust Implicit Regularization via Weight Normalization
- Title(参考訳): 重み正規化によるロバスト入射規則化
- Authors: Hung-Hsu Chou, Holger Rauhut, Rachel Ward
- Abstract要約: 重み正規化は、重みが実質的に大規模であっても持続する頑健なバイアスを可能にすることを示す。
実験により, 暗黙バイアスの収束速度とロバスト性の両方の利得は, 重み正規化を用いて劇的に改善されることが示唆された。
- 参考スコア(独自算出の注目度): 6.042206709451915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Overparameterized models may have many interpolating solutions; implicit
regularization refers to the hidden preference of a particular optimization
method towards a certain interpolating solution among the many. A by now
established line of work has shown that (stochastic) gradient descent tends to
have an implicit bias towards low rank and/or sparse solutions when used to
train deep linear networks, explaining to some extent why overparameterized
neural network models trained by gradient descent tend to have good
generalization performance in practice.However, existing theory for square-loss
objectives often requires very small initialization of the trainable weights,
which is at odds with the larger scale at which weights are initialized in
practice for faster convergence and better generalization performance. In this
paper, we aim to close this gap by incorporating and analyzing gradient flow
(continuous-time version of gradient descent) with weight normalization, where
the weight vector is reparameterized in terms of polar coordinates, and
gradient flow is applied to the polar coordinates. By analyzing key invariants
of the gradient flow and using Lojasiewicz Theorem, we show that weight
normalization also has an implicit bias towards sparse solutions in the
diagonal linear model, but that in contrast to plain gradient flow, weight
normalization enables a robust bias that persists even if the weights are
initialized at practically large scale. Experiments suggest that the gains in
both convergence speed and robustness of the implicit bias are improved
dramatically by using weight normalization in overparameterized diagonal linear
network models.
- Abstract(参考訳): 過度パラメータ化モデルは多くの補間解を持ち、暗黙の正規化は、多くの間の補間解に対する特定の最適化手法の隠れた選好を指す。
A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear networks, explaining to some extent why overparameterized neural network models trained by gradient descent tend to have good generalization performance in practice.However, existing theory for square-loss objectives often requires very small initialization of the trainable weights, which is at odds with the larger scale at which weights are initialized in practice for faster convergence and better generalization performance.
実験により, 重み正規化を用いた過パラメータ付き対角型線形ネットワークモデルにおいて, 収束速度と暗黙バイアスのロバスト性の両方が劇的に向上することが示唆された。
