Fugu-MT 論文翻訳(概要): Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime

論文の概要: Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime

arxiv url: http://arxiv.org/abs/2603.10485v1
Date: Wed, 11 Mar 2026 07:19:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.825645
Title: Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime
Title（参考訳）: 過パラメータ化レジームにおけるグラディエントドライズのためのデュアルスペースプレコンディショニング
Authors: Reza Ghane, Danil Akhtiamov, Babak Hassibi,
Abstract要約: 双対空間プレコンディショニンググラディエントDescentの収束特性について検討した。また、デュアルスペースプレコンディショニングによるグラディエントDescentの暗黙バイアスについても検討した。
参考スコア（独自算出の注目度）: 14.991382702354924
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work we study the convergence properties of the Dual Space Preconditioned Gradient Descent, encompassing optimizers such as Normalized Gradient Descent, Gradient Clipping and Adam. We consider preconditioners of the form $\nabla K$, where $K: \mathbb{R}^p \to \mathbb{R}$ is convex and assume that the latter is applied to train an over-parameterized linear model with loss of the form $\ell({X} {W} - {Y})$, for weights ${W} \in \mathbb{R}^{d \times k}$, labels ${Y} \in \mathbb{R}^{n \times k}$ and data ${X} \in \mathbb{R}^{n \times d}$. Under the aforementioned assumptions, we prove that the iterates of the preconditioned gradient descent always converge to a point ${W}_{\infty} \in \mathbb{R}^{d \times k}$ satisfying ${X}{W}_{\infty} = {Y}$. Our proof techniques are of independent interest as we introduce a novel version of the Bregman Divergence with accompanying identities that allow us to establish convergence. We also study the implicit bias of Dual Space Preconditioned Gradient Descent. First, we demonstrate empirically that, for general $K(\cdot)$, ${W}_\infty$ depends on the chosen learning rate, hindering a precise characterization of the implicit bias. Then, for preconditioners of the form $K({G}) = h(\|{G}\|_F)$, known as \textit{isotropic preconditioners}, we show that ${W}_\infty$ minimizes $\|{W}_\infty - {W}_0\|_F^2$ subject to ${X}{W}_\infty = {Y}$, where ${W}_0$ is the initialization. Denoting the convergence point of GD initialized at ${W}_0$ by ${W}_{\text{GD}, \infty}$, we thus note ${W}_{\infty} = {W}_{\text{GD}, \infty}$ for isotropic preconditioners. Finally, we show that a similar fact holds for general preconditioners up to a multiplicative constant, namely, $\|{W}_0 - {W}_{\infty}\|_F \le c \|{W}_0 - {W}_{\text{GD}, \infty}\|_F$ for a constant $c>0$.
Abstract（参考訳）: 本研究は, 正規化グラディエント・ディフレッシュ, グラディエント・クリッピング, アダムなどの最適化器を含む2次元空間プレコンディショニンググラディエント・ディフレッシュの収束特性について検討する。ここで $K: \mathbb{R}^p \to \mathbb{R}$ は凸であり、後者は $\ell({X} {W} - {Y})$, for weights ${W} \in \mathbb{R}^{d \times k}$, labels ${Y} \in \mathbb{R}^{n \times k}$, data ${X} \in \mathbb{R}^{n \times d}$ の形で過パラメータ線型モデルを訓練するために適用されると仮定する。上記の仮定の下では、事前条件付き勾配降下の反復は、常に点 ${W}_{\infty} \in \mathbb{R}^{d \times k}$ に収束し、${X}{W}_{\infty} = {Y}$ を満たすことを証明している。我々の証明技術は、Bregman Divergenceの新しいバージョンを導入し、それに付随するアイデンティティを導入し、収束を確立するために、独立した関心を持っている。また、デュアルスペースプレコンディショニングによるグラディエントDescentの暗黙バイアスについても検討した。まず、一般に$K(\cdot)$の場合、${W}_\infty$は選択した学習率に依存し、暗黙のバイアスの正確な評価を妨げることを実証的に示す。このとき、$K({G}) = h(\|{G}\|_F)$、あるいは \textit{isotropic preconditioners} に対して、${W}_\infty$ が $\|{W}_\infty - {W}_0\|_F^2$ を ${X}{W}_\infty = {Y}$ に最小化することを示す。 GD の収束点を ${W}_0$ by ${W}_{\text{GD}, \infty}$ で表すと、等方的プレコンディショナーに対して ${W}_{\infty} = {W}_{\text{GD}, \infty}$ となる。最後に、同様の事実が一般プレコンディショナーに対して乗法定数、すなわち、定数$c>0$に対して $\|{W}_0 - {W}_{\infty}\|_F \le c \|{W}_0 - {W}_{\text{GD}, \infty}\|_F$ まで成り立つことを示す。

論文の概要: Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime

関連論文リスト