Fugu-MT 論文翻訳(概要): Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization

論文の概要: Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization

arxiv url: http://arxiv.org/abs/2009.09835v1
Date: Fri, 18 Sep 2020 02:18:44 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-17 02:50:44.708053
Title: Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization
Title（参考訳）: ハイブリッド確率的決定論的ミニバッチ近位勾配:ほぼ最適一般化によるシングルパス最適化
Authors: Pan Zhou, Xiaotong Yuan
Abstract要約: HSDMPGは、学習モデル上で過大なエラーの順序である$mathcalObig(1/sttnbig)$を達成可能であることを示す。損失係数について、HSDMPGは学習モデル上で過大なエラーの順序である$mathcalObig(1/sttnbig)$を達成できることを示す。
参考スコア（独自算出の注目度）: 83.80460802169999
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic variance-reduced gradient (SVRG) algorithms have been shown to work favorably in solving large-scale learning problems. Despite the remarkable success, the stochastic gradient complexity of SVRG-type algorithms usually scales linearly with data size and thus could still be expensive for huge data. To address this deficiency, we propose a hybrid stochastic-deterministic minibatch proximal gradient (HSDMPG) algorithm for strongly-convex problems that enjoys provably improved data-size-independent complexity guarantees. More precisely, for quadratic loss $F(\theta)$ of $n$ components, we prove that HSDMPG can attain an $\epsilon$-optimization-error $\mathbb{E}[F(\theta)-F(\theta^*)]\leq\epsilon$ within $\mathcal{O}\Big(\frac{\kappa^{1.5}\epsilon^{0.75}\log^{1.5}(\frac{1}{\epsilon})+1}{\epsilon}\wedge\Big(\kappa \sqrt{n}\log^{1.5}\big(\frac{1}{\epsilon}\big)+n\log\big(\frac{1}{\epsilon}\big)\Big)\Big)$ stochastic gradient evaluations, where $\kappa$ is condition number. For generic strongly convex loss functions, we prove a nearly identical complexity bound though at the cost of slightly increased logarithmic factors. For large-scale learning problems, our complexity bounds are superior to those of the prior state-of-the-art SVRG algorithms with or without dependence on data size. Particularly, in the case of $\epsilon=\mathcal{O}\big(1/\sqrt{n}\big)$ which is at the order of intrinsic excess error bound of a learning model and thus sufficient for generalization, the stochastic gradient complexity bounds of HSDMPG for quadratic and generic loss functions are respectively $\mathcal{O} (n^{0.875}\log^{1.5}(n))$ and $\mathcal{O} (n^{0.875}\log^{2.25}(n))$, which to our best knowledge, for the first time achieve optimal generalization in less than a single pass over data. Extensive numerical results demonstrate the computational advantages of our algorithm over the prior ones.
Abstract（参考訳）: 確率分散還元勾配 (svrg) アルゴリズムは, 大規模学習問題の解法として有効であることが示されている。顕著な成功にもかかわらず、svrg型アルゴリズムの確率的勾配複雑性は通常、データサイズと線形にスケールするので、巨大なデータにはコストがかかる可能性がある。この欠陥に対処するために,データサイズに依存しない複雑性保証を確実に改善した強凸問題に対して,Hybrid stochastic-Deterministic Minibatch proximal gradient (HSDMPG)アルゴリズムを提案する。より正確には、$F(\theta)$ of $n$コンポーネントに対して、HSDMPGが$\epsilon$-optimization-error $\mathbb{E}[F(\theta)-F(\theta^*)]\leq\epsilon$ in $\mathcal{O}\Big(\frac{\kappa^{1.5}\epsilon^{0.75}\log^{1.5}(\frac{1}{\epsilon})+1}{\epsilon}\wedge\Big(\kappa \sqrt{n}\log^{1.5}\big(\frac{1}{\epsilon}\big)+n\log(\frac{1}{\epsilon)\big) +n\log(\frac{1}{\epsilon)\big)\big) を満たすことを証明している。一般の強凸損失関数に対しては、対数係数がわずかに増加するコストで、ほぼ同一の複雑性を証明できる。大規模学習問題では,従来のSVRGアルゴリズムよりも,データサイズに依存するか否かに関わらず,複雑性境界が優れている。特に、学習モデルの本質的な過大誤差境界の順であり、一般化に十分である$\epsilon=\mathcal{o}\big(1/\sqrt{n}\big)$の場合、二次損失関数に対するhsdmpgの確率的勾配複雑性境界は、それぞれ$\mathcal{o} (n^{0.875}\log^{1.5}(n))$ と$\mathcal{o} (n^{0.875}\log^{2.25}(n))$であり、我々の知る限りでは、単一のパスオーバデータ未満で最適な一般化を達成することができる。広範な数値計算結果から,従来のアルゴリズムよりも計算効率が優れていることを示す。

関連論文リスト

Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
複素勾配問題に対する2つの分散化ZO推定器を提案する。我々は、現在最先端の機能複雑性を$mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$から$tildecalOleft(fracdepsilon2right)$に改善する。
論文参考訳（メタデータ） (2024-10-03T15:04:01Z)
Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization [48.84672493756553]
我々は,1ステップあたり$Obig(slogfrac dsbig)$クエリのみを使用する勾配のクエリ効率と精度の高い推定器を提案する。 Indyk-Price-Woodruff (IPW) アルゴリズムを線形測定から非線形関数への圧縮センシングにおいて一般化した。
論文参考訳（メタデータ） (2024-05-27T03:52:53Z)
The Sample Complexity Of ERMs In Stochastic Convex Optimization [13.896417716930687]
実際に$tildeO(fracdepsilon+frac1epsilon2)$データポイントも十分であることを示す。さらに、この結果を一般化し、全ての凸体に対して同様の上界が成り立つことを示す。
論文参考訳（メタデータ） (2023-11-09T14:29:25Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
正方形損失に関して、標準的なガウス分布の下での$k$ReLU活性化の線形結合をPAC学習する問題をmathbbRd$で検討する。本研究の主な成果は,この学習課題に対して,サンプルおよび計算複雑性が$(dk/epsilon)O(k)$で,epsilon>0$が目標精度である。
論文参考訳（メタデータ） (2023-07-24T14:37:22Z)
A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic Composite Optimization [10.762749887051546]
本稿では,Proxcal$DASA-GTとProxcal$DASA-Aの2時間スケールアルゴリズムを提案する。以前の作業とは異なり、我々のアルゴリズムは、大きなバッチサイズ、より複雑な単位演算、より強い仮定を必要とせずに、同等の複雑さを達成する。
論文参考訳（メタデータ） (2023-02-20T05:16:18Z)
ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
本稿では,BFG凸最適化(SCO: Reweighted Query (ReSQue) 推定ツールを提案する。我々はSCOの並列およびプライベート設定における最先端の複雑さを実現するアルゴリズムを開発した。
論文参考訳（メタデータ） (2023-01-01T18:51:29Z)
Robust Sparse Mean Estimation via Sum of Squares [42.526664955704746]
本研究では,高次元スパース平均推定の問題点を,逆数外乱の$epsilon$-fractionの存在下で検討する。我々のアルゴリズムは、サム・オブ・スクエア(Sum-of-Squares)ベースのアルゴリズムアプローチに従う。
論文参考訳（メタデータ） (2022-06-07T16:49:54Z)
Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity [121.83085611327654]
積 $r:=p + q$, ここで$r$は$mu$-strong convex類似性である。エージェントの通信やローカルコールにマスターされた問題を解決する方法を提案する。提案手法は$mathcalO(sqrtL_q/mu)$法よりもはるかにシャープである。
論文参考訳（メタデータ） (2022-05-30T14:28:02Z)
Gradient-Based Empirical Risk Minimization using Local Polynomial Regression [39.29885444997579]
この論文の主な目標は、勾配降下(GD)や勾配降下(SGD)といった異なるアルゴリズムを比較することである。損失関数がデータのスムーズな場合、各反復でオラクルを学習し、GDとSGDの両方のオラクル複雑度に打ち勝つことができることを示す。
論文参考訳（メタデータ） (2020-11-04T20:10:31Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
本稿では、勾配のないミニマックス最適化問題の大きさを非強設定で表現する。本稿では,新しいゼロ階分散還元降下アルゴリズムが,クエリの複雑さを最もよく表すことを示す。
論文参考訳（メタデータ） (2020-06-16T17:55:46Z)
Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models [2.384873896423002]
非滑らかな成分を持つ汎用合成対象関数に対する勾配アルゴリズムのミニバッチ変種を開発し解析する。我々は、最小バッチサイズ$N$に対して、$mathcalO(frac1Nepsilon)$$epsilon-$subityが最適解に期待される二次距離で達成されるような、定数および変数のステップサイズ反復ポリシーの複雑さを提供する。
論文参考訳（メタデータ） (2020-03-30T10:43:56Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。