Fugu-MT 論文翻訳(概要): Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling

論文の概要: Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling

arxiv url: http://arxiv.org/abs/2206.02275v1
Date: Sun, 5 Jun 2022 21:32:33 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-08 08:00:33.589644
Title: Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling
Title（参考訳）: クライアントとデータサンプリングによる非凸SGDのシャーパレートとフレキシブルフレームワーク
Authors: Alexander Tyurin and Lukang Sun and Konstantin Burlachenko and Peter Richt\'arik
Abstract要約: 我々は、平均$n$スムーズでおそらくは非カラー関数のほぼ定常点を求める問題を再考する。我々は$smallsfcolorgreen$を一般化し、事実上あらゆるサンプリングメカニズムで確実に動作するようにします。我々は、スムーズな非カラー状態における最適境界の最も一般的な、最も正確な解析を提供する。
参考スコア（独自算出の注目度）: 64.31011847952006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We revisit the classical problem of finding an approximately stationary point of the average of $n$ smooth and possibly nonconvex functions. The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is $\mathcal{O}\left(n + n^{1/2}\varepsilon^{-1}\right)$, attained by the optimal SGD methods $\small\sf\color{green}{SPIDER}$(arXiv:1807.01695) and $\small\sf\color{green}{PAGE}$(arXiv:2008.10898), for example, where $\varepsilon$ is the error tolerance. However, i) the big-$\mathcal{O}$ notation hides crucial dependencies on the smoothness constants associated with the functions, and ii) the rates and theory in these methods assume simplistic sampling mechanisms that do not offer any flexibility. In this work we remedy the situation. First, we generalize the $\small\sf\color{green}{PAGE}$ algorithm so that it can provably work with virtually any (unbiased) sampling mechanism. This is particularly useful in federated learning, as it allows us to construct and better understand the impact of various combinations of client and data sampling strategies. Second, our analysis is sharper as we make explicit use of certain novel inequalities that capture the intricate interplay between the smoothness constants and the sampling procedure. Indeed, our analysis is better even for the simple sampling procedure analyzed in the $\small\sf\color{green}{PAGE}$ paper. However, this already improved bound can be further sharpened by a different sampling scheme which we propose. In summary, we provide the most general and most accurate analysis of optimal SGD in the smooth nonconvex regime. Finally, our theoretical findings are supposed with carefully designed experiments.
Abstract（参考訳）: 我々は、平均$n$滑らかでおそらくは非凸関数のほぼ定常点を求める古典的問題を再考する。個々の関数の勾配評価数の観点からの確率的一階法の最適複雑性は$\mathcal{o}\left(n + n^{1/2}\varepsilon^{-1}\right)$であり、例えば$\varepsilon$がエラー耐性であるような最適sgd法$\small\sf\color{green}{spider}$(arxiv:1807.01695)と$\small\sf\color{green}{page}$(arxiv:2008.10898)である。しかし、 i) big-$\mathcal{O}$ 表記は、関数に関連する滑らかさ定数に重要な依存を隠蔽し、二この方法の率及び理論は、柔軟性を提供しない簡易サンプリング機構を仮定する。この仕事で私たちは状況を修復する。まず、$\small\sf\color{green}{page}$アルゴリズムを一般化し、事実上任意の(偏りのない)サンプリングメカニズムで動作できるようにします。これは、クライアントとデータサンプリング戦略の様々な組み合わせの影響を構築およびよりよく理解することができるため、フェデレートドラーニングにおいて特に有用である。第2に, 平滑度定数とサンプリング手順の複雑な相互作用を捉えた新たな不等式を明示的に利用することにより, 解析がよりシャープになる。実際、この分析は$\small\sf\color{green}{PAGE}$ paperで分析された単純なサンプリング手順よりも優れている。しかし,提案する異なるサンプリング方式により,既に改良された境界をさらに研ぎ取ることができる。要約すると、スムーズな非凸状態における最適SGDの最も一般的かつ正確な解析を提供する。最後に、我々の理論的発見は慎重に設計された実験である。

関連論文リスト

The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis [6.996002801232415]
平均回帰マルコフ決定過程において,$varepsilon$-optimal Policyを学習するためのプラグインアプローチのサンプル複雑性について検討した。この問題の最も単純なアルゴリズムであるにもかかわらず、プラグインのアプローチは理論上は分析されていない。
論文参考訳（メタデータ） (2024-10-10T05:08:14Z)
Function Value Learning: Adaptive Learning Rates Based on the Polyak Stepsize and Function Splitting in ERM [6.542289202349586]
我々は、経験的リスク最小化(experiical risk minimization)としても知られる有限項和問題に焦点をあてる。最初に、サンプル損失値を利用する、$textttSPS_+$と呼ばれる理想化された適応メソッドを詳述する。次に、最適な損失値が徐々に学習される$textttSPS_+$の変種である$textttFUVAL$を開発する。
論文参考訳（メタデータ） (2023-07-26T22:12:31Z)
Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds [12.699376765058137]
勾配降下法(SGD)は、おそらく現代の機械学習において最も一般的な最適化法である。 SGDを交換せずにサンプリングするSGDが分析されたのはごく最近のことだ。データマトリックスに依存し、既存の境界によって予測されるものよりも決して悪くない、きめ細かい複雑性境界を証明します。
論文参考訳（メタデータ） (2023-06-21T18:14:44Z)
Achieving the Asymptotically Optimal Sample Complexity of Offline Reinforcement Learning: A DRO-Based Approach [36.88301225561535]
オフライン強化学習は、アクティブな探索なしに、事前に収集されたデータセットから学習することを目的としている。既存のアプローチでは、不確実性に対する悲観的なスタンスを採用し、探索されていない状態-作用対の報酬を、保守的に値関数を推定する。分散ロバスト最適化(DRO)に基づくアプローチはこれらの課題にも対処でき、漸近的に最小限の最適化であることを示す。
論文参考訳（メタデータ） (2023-05-22T17:50:18Z)
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization [88.0031283949404]
多くの実世界の問題は複雑な非機能的制約を持ち、多くのデータポイントを使用する。提案手法は,従来最もよく知られた結果で既存手法よりも優れた性能を示す。
論文参考訳（メタデータ） (2022-12-19T14:48:54Z)
Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization [49.58290066287418]
構成問題の複雑さを軽減するために,MSVR (Multi-block-probe Variance Reduced) という新しい手法を提案する。本研究の結果は, 試料の複雑さの順序や強靭性への依存など, 様々な面で先行して改善された。
論文参考訳（メタデータ） (2022-07-18T12:03:26Z)
Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
既知の滑らかな関数 $f$ の値を未知の点 $boldsymbolmu in mathbbRn$ で推定する問題について検討する。我々は、各座標の重要性に応じてサンプルを学習するインスタンス適応アルゴリズムを設計し、少なくとも1-delta$の確率で$epsilon$の正確な推定値である$f(boldsymbolmu)$を返す。
論文参考訳（メタデータ） (2022-03-18T18:50:52Z)
Stochastic Bias-Reduced Gradient Methods [44.35885731095432]
モロー・吉田関数の任意の有界な$x_star$の低バイアスで低コストな平滑化である。
論文参考訳（メタデータ） (2021-06-17T13:33:05Z)
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms [69.45237691598774]
マルコフ連鎖からデータポイントが依存しサンプリングされる最小二乗線形回帰問題について検討する。この問題を$tau_mathsfmix$という観点から、鋭い情報理論のミニマックス下限を確立する。本稿では,経験的リプレイに基づくアルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-16T04:26:50Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
非同期Q-ラーニングはマルコフ決定過程(MDP)の最適行動値関数(またはQ-関数)を学習することを目的としている。 Q-関数の入出力$varepsilon$-正確な推定に必要なサンプルの数は、少なくとも$frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$の順である。
論文参考訳（メタデータ） (2020-06-04T17:51:00Z)
Better Theory for SGD in the Nonconvex World [2.6397379133308214]
大規模な非最適化問題は、現代の機械学習ではユビキタスである。我々は, 広範囲の合成ミニバッチサイズがグラディエントDescent (SG) 問題に与える影響について実験を行った。
論文参考訳（メタデータ） (2020-02-09T09:56:06Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。