Fugu-MT 論文翻訳(概要): Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization

論文の概要: Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization

arxiv url: http://arxiv.org/abs/2507.09093v1
Date: Sat, 12 Jul 2025 00:31:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-15 18:48:22.365281
Title: Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization
Title（参考訳）: シンメトリゼーションによる重み付き雑音下での非線形SGDの最適高確率収束
Authors: Aleksandar Armacki, Dragana Bajovic, Dusan Jakovetic, Soummya Kar,
Abstract要約: 雑音対称性に基づく2つの新しい推定器を提案する。よりシャープな分析と改善されたレートを提供します。モーメントと対称雑音を仮定する作業と比較して、よりシャープな解析と改善率を提供する。
参考スコア（独自算出の注目度）: 50.49466204159458
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study convergence in high-probability of SGD-type methods in non-convex optimization and the presence of heavy-tailed noise. To combat the heavy-tailed noise, a general black-box nonlinear framework is considered, subsuming nonlinearities like sign, clipping, normalization and their smooth counterparts. Our first result shows that nonlinear SGD (N-SGD) achieves the rate $\widetilde{\mathcal{O}}(t^{-1/2})$, for any noise with unbounded moments and a symmetric probability density function (PDF). Crucially, N-SGD has exponentially decaying tails, matching the performance of linear SGD under light-tailed noise. To handle non-symmetric noise, we propose two novel estimators, based on the idea of noise symmetrization. The first, dubbed Symmetrized Gradient Estimator (SGE), assumes a noiseless gradient at any reference point is available at the start of training, while the second, dubbed Mini-batch SGE (MSGE), uses mini-batches to estimate the noiseless gradient. Combined with the nonlinear framework, we get N-SGE and N-MSGE methods, respectively, both achieving the same convergence rate and exponentially decaying tails as N-SGD, while allowing for non-symmetric noise with unbounded moments and PDF satisfying a mild technical condition, with N-MSGE additionally requiring bounded noise moment of order $p \in (1,2]$. Compared to works assuming noise with bounded $p$-th moment, our results: 1) are based on a novel symmetrization approach; 2) provide a unified framework and relaxed moment conditions; 3) imply optimal oracle complexity of N-SGD and N-SGE, strictly better than existing works when $p < 2$, while the complexity of N-MSGE is close to existing works. Compared to works assuming symmetric noise with unbounded moments, we: 1) provide a sharper analysis and improved rates; 2) facilitate state-dependent symmetric noise; 3) extend the strong guarantees to non-symmetric noise.
Abstract（参考訳）: 非凸最適化におけるSGD方式の高確率収束と重み付き雑音の存在について検討した。ヘビーテールノイズに対処するため、一般的なブラックボックス非線形の枠組みを考慮し、符号、クリップ、正規化、およびそれらの滑らかな非線形性を仮定する。最初の結果は非線形SGD (N-SGD) が非有界モーメントと対称確率密度関数 (PDF) を持つ雑音に対して$\widetilde{\mathcal{O}}(t^{-1/2})$となることを示す。重要なことに、N-SGDは指数関数的に減衰し、光尾雑音下での線形SGDの性能と一致する。非対称雑音に対処するため,雑音対称性に基づく2つの新しい推定器を提案する。最初はSymmetrized Gradient Estimator (SGE) と呼ばれ、訓練開始時に任意の基準点におけるノイズレス勾配を仮定し、2番目はミニバッチSGE (MSGE) と呼ばれ、ノイズレス勾配を推定するためにミニバッチを使用する。 N-SGE法とN-MSGE法を併用すると,N-SGE法はN-SGD法と同じ収束率と指数関数的に減衰する尾部を達成できると同時に,非対称ノイズを非有界モーメントで許容し,PDFが軽度な技術的条件を満たすとともに,N-MSGE法は次数$p \in (1,2]$の有界雑音モーメントを付加的に要求する。条件付き$p$-thのノイズを仮定する作業と比較して、以下の結果が得られます。 1)は,新しい対称性のアプローチに基づく。 2 統合された枠組み及び緩やかなモーメント条件を提供する。 3) N-SGD と N-SGE の最適オラクル複雑性は、$p < 2$ のときの既存の作業よりも厳密に優れているが、N-MSGE の複雑さは既存の作業に近い。非有界モーメントを持つ対称雑音を仮定する作品と比較すると、 1) より鋭い分析と改善率を提供する。 2) 状態依存型対称雑音の緩和 3) 強い保証を非対称雑音に拡張する。

論文の概要: Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization

関連論文リスト