Fugu-MT 論文翻訳(概要): Beyond Bounded Variance: Variance-Reduced Normalized Methods for Nonconvex Optimization under Blum-Gladyshev Noise

論文の概要: Beyond Bounded Variance: Variance-Reduced Normalized Methods for Nonconvex Optimization under Blum-Gladyshev Noise

arxiv url: http://arxiv.org/abs/2605.15314v1
Date: Thu, 14 May 2026 18:27:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 21:22:26.060162
Title: Beyond Bounded Variance: Variance-Reduced Normalized Methods for Nonconvex Optimization under Blum-Gladyshev Noise
Title（参考訳）: 境界変数を超えて:Blum-Gladyshev雑音下での非凸最適化のための分散誘導正規化法
Authors: Antesh Upadhyay, Arda Fazla, Abolfazl Hashemi,
Abstract要約: 我々は,Blum-Gladyshev(mathsfBG$-0)ノイズモデルの下での非線形最適化について検討した。モーメント付き正規化勾配降下は、勾配毎のパラメータを1つだけ使い、複雑さが$O(varepsilon-6)$で$mathsfBG$-0ノイズの下に収束することを示す。
参考スコア（独自算出の注目度）: 7.692336118507715
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study nonconvex stochastic optimization under the Blum-Gladyshev ($\mathsf{BG}$-0) noise model, where the stochastic gradient variance grows quadratically with the distance from the initialization. We consider this problem under both standard smoothness and the symmetric generalized-smoothness framework, which captures objectives whose local curvature can scale with the gradient norm. We prove that normalized stochastic gradient descent with momentum, using only one stochastic gradient per iteration, converges under $\mathsf{BG}$-0 noise with oracle complexity $O(\varepsilon^{-6})$. This rate holds both for standard smoothness and for $α$-symmetric generalized smoothness, showing that generalized smoothness is rate-neutral for normalized momentum in this setting. We then study a variance-reduced normalized STORM method. Under mean-square smoothness and sharp initialization, the method achieves the minimax optimal $O(\varepsilon^{-4})$ complexity, matching the lower bound. Under expected $α$-symmetric generalized smoothness, the STORM recursion couples gradient-dependent smoothness with distance-dependent noise, leading to complexity $O(\varepsilon^{-(4+α)})$ for $α\in(0,1)$ and $O(\varepsilon^{-5})$ for $α=1$. When the distance-growth parameter in the noise model vanishes, our guarantees recover the standard bounded-variance rates: $O(\varepsilon^{-4})$ for momentum, $O(\varepsilon^{-3})$ for variance reduction, and $O(\varepsilon^{-2})$ in the deterministic case. To our knowledge, these are the first convergence guarantees for normalized methods in non-convex stochastic optimization under $\mathsf{BG}$-0 noise without bounded domains, increasing batch sizes, or explicit anchoring, covering both standard and generalized smoothness regimes.
Abstract（参考訳）: Blum-Gladyshev ($\mathsf{BG}$-0) ノイズモデルの下では、確率勾配の分散は初期化からの距離で2次的に増加する。この問題は、局所曲率を勾配ノルムでスケール可能な対象を捉える、標準滑らか性と対称一般化平滑化フレームワークの両方の下で検討する。モーメント付き正規化確率勾配降下は、反復ごとに1つの確率勾配のみを用いて、オラクル複雑性を持つ$O(\varepsilon^{-6})$-0ノイズの下で収束することを示す。この速度は標準の滑らかさと$α$対称の一般化された滑らかさの両方に対して成り立ち、この設定において一般化された滑らかさは正規化された運動量に対して速度ニュートラルであることを示す。次に、分散還元された正規化STORM法について検討する。平均二乗の滑らかさと鋭い初期化の下で、この方法は最小極大の$O(\varepsilon^{-4})$複雑性を達成し、下界と一致する。予想される$α$-対称一般化滑らかさの下で、STORM再帰は距離依存ノイズと勾配依存滑らかさを結合し、複雑さを$O(\varepsilon^{-(4+α)})$ for $α\in(0,1)$と$O(\varepsilon^{-5})$ for $α=1$となる。ノイズモデルにおける距離成長パラメータがなくなると、運動量に対して$O(\varepsilon^{-4})$、分散還元のために$O(\varepsilon^{-3})$、決定論的場合には$O(\varepsilon^{-2})$を回復する。我々の知る限り、これらは境界領域のない$\mathsf{BG}$-0ノイズの下での非凸確率最適化における正規化手法に対する最初の収束保証であり、バッチサイズの増加、あるいは明示的なアンカー化であり、標準および一般化された滑らか性条件の両方をカバーする。

論文の概要: Beyond Bounded Variance: Variance-Reduced Normalized Methods for Nonconvex Optimization under Blum-Gladyshev Noise

関連論文リスト