Fugu-MT 論文翻訳(概要): Revisiting Step-Size Assumptions in Stochastic Approximation

論文の概要: Revisiting Step-Size Assumptions in Stochastic Approximation

arxiv url: http://arxiv.org/abs/2405.17834v2
Date: Mon, 3 Jun 2024 14:05:52 GMT
ステータス: 翻訳完了
システム内更新日: 2024-06-04 13:59:47.256942
Title: Revisiting Step-Size Assumptions in Stochastic Approximation
Title（参考訳）: 確率近似におけるステップサイズ推定の再検討
Authors: Caio Kalil Lauand, Sean Meyn,
Abstract要約: この論文は、一般的なマルコフ的な設定でステップサイズの選択を再考する。大きな結論は、$rho =0$ または $rho1/2$ の選択は、選択した設定でのみ正当化されるということである。
参考スコア（独自算出の注目度）: 1.3654846342364308
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many machine learning and optimization algorithms are built upon the framework of stochastic approximation (SA), for which the selection of step-size (or learning rate) is essential for success. For the sake of clarity, this paper focuses on the special case $\alpha_n = \alpha_0 n^{-\rho}$ at iteration $n$, with $\rho \in [0,1]$ and $\alpha_0>0$ design parameters. It is most common in practice to take $\rho=0$ (constant step-size), while in more theoretically oriented papers a vanishing step-size is preferred. In particular, with $\rho \in (1/2, 1)$ it is known that on applying the averaging technique of Polyak and Ruppert, the mean-squared error (MSE) converges at the optimal rate of $O(1/n)$ and the covariance in the central limit theorem (CLT) is minimal in a precise sense. The paper revisits step-size selection in a general Markovian setting. Under readily verifiable assumptions, the following conclusions are obtained provided $0<\rho<1$: $\bullet$ Parameter estimates converge with probability one, and also in $L_p$ for any $p\ge 1$. $\bullet$ The MSE may converge very slowly for small $\rho$, of order $O(\alpha_n^2)$ even with averaging. $\bullet$ For linear stochastic approximation the source of slow convergence is identified: for any $\rho\in (0,1)$, averaging results in estimates for which the error $\textit{covariance}$ vanishes at the optimal rate, and moreover the CLT covariance is optimal in the sense of Polyak and Ruppert. However, necessary and sufficient conditions are obtained under which the $\textit{bias}$ converges to zero at rate $O(\alpha_n)$. This is the first paper to obtain such strong conclusions while allowing for $\rho \le 1/2$. A major conclusion is that the choice of $\rho =0$ or even $\rho<1/2$ is justified only in select settings -- In general, bias may preclude fast convergence.
Abstract（参考訳）: 多くの機械学習と最適化アルゴリズムは確率近似(SA)の枠組みに基づいて構築されており、ステップサイズ(または学習率)の選択は成功に不可欠である。明確にするために、本稿では、特別なケースである $\alpha_n = \alpha_0 n^{-\rho}$ at iteration $n$, with $\rho \in [0,1]$ and $\alpha_0>0$ に焦点を当てる。実際には$\rho=0$ (constant step-size)を取るのが一般的であるが、より理論的に指向した論文では、消滅する Step-size が好まれる。特に、$\rho \in (1/2, 1)$の場合、平均二乗誤差(MSE)は$O(1/n)$の最適速度で収束し、中央極限定理(CLT)の共分散は正確な意味で最小となることが知られている。この論文は、一般的なマルコフ的な設定でステップサイズの選択を再考する。容易に検証可能な仮定の下で、以下の結論が得られる:$0<\rho<1$:$\bullet$パラメータ推定は確率1と収束し、任意の$p\ge 1$に対して$L_p$である。 $\bullet$ MSE は小さな $\rho$ に対して非常にゆっくりと収束し、平均化しても$O(\alpha_n^2)$ である。任意の$\rho\in (0,1)$に対して、誤差 $\textit{covariance}$ が最適速度で消滅する推定結果の平均化結果、さらに CLT の共分散はポリアクとルパートの意味で最適である。しかし、$\textit{bias}$が$O(\alpha_n)$で0に収束する必要十分条件が得られる。これはそのような強い結論を得た最初の論文であり、$\rho \le 1/2$ を許容する。大きな結論は、$\rho =0$ あるいは $\rho<1/2$ の選択は、選択した設定でのみ正当化されるということだ。

関連論文リスト

Allocating Variance to Maximize Expectation [2.25491649634702]
ガウス確率変数の系列の上限を最大化するための効率的な近似アルゴリズムを設計する。このような期待問題は、ユーティリティオークションから、定量的遺伝学の混合モデルを学ぶことまで、様々な応用で発生する。
論文参考訳（メタデータ） (2025-02-25T18:59:46Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
線形スケッチを用いた行列とベクトルノルムの残差誤差推定問題について検討する。これは、前作とほぼ同じスケッチサイズと精度で、経験的にかなり有利であることを示す。また、スパースリカバリ問題に対して$Omega(k2/pn1-2/p)$低いバウンダリを示し、これは$mathrmpoly(log n)$ factorまで厳密である。
論文参考訳（メタデータ） (2024-08-16T02:33:07Z)
Robust Distribution Learning with Local and Global Adversarial Corruptions [17.22168727622332]
誤差を$sqrtvarepsilon k + rho + tildeO(dsqrtkn-1/(k lor2)$で有界な共分散を持つ場合、効率的な有限サンプルアルゴリズムを開発する。我々の効率的な手順は、理想的だが難解な2-ワッサーシュタイン射影推定器の新たなトレースノルム近似に依存する。
論文参考訳（メタデータ） (2024-06-10T17:48:36Z)
On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
本稿では、RMSPropとその運動量拡張を考察し、$frac1Tsum_k=1Tの収束速度を確立する。我々の収束率は、次元$d$を除くすべての係数に関して下界と一致する。収束率は$frac1Tsum_k=1Tと類似していると考えられる。
論文参考訳（メタデータ） (2024-02-01T07:21:32Z)
Solving Dense Linear Systems Faster Than via Preconditioning [1.8854491183340518]
我々のアルゴリズムは$tilde O(n2)$ if $k=O(n0.729)$であることを示す。特に、我々のアルゴリズムは$tilde O(n2)$ if $k=O(n0.729)$である。主アルゴリズムはランダム化ブロック座標降下法とみなすことができる。
論文参考訳（メタデータ） (2023-12-14T12:53:34Z)
Finite-Sample Symmetric Mean Estimation with Fisher Information Rate [15.802475232604667]
未知の分散-$sigma2$分布の意味は、分散$fracsigma2n$とほぼ対応する準ガウス速度を持つ$n$サンプルから推定できる。 f$が翻訳で知られている場合、$frac1nmathcal I$に改善することができる。
論文参考訳（メタデータ） (2023-06-28T21:31:46Z)
High-dimensional Location Estimation via Norm Concentration for Subgamma Vectors [15.802475232604667]
位置推定では、既知の分布から$n$のサンプルが与えられます。漸近的に、最大推定は誤差$mathcal N(0, frac1nmathcal I)$のクラム・ラオ境界を達成する。我々は、Emphsmoothed estimator を用いて、$mathcal I_r$, the Fisher information of the $r$-smoothed の有限$n$の誤差を束縛する理論を構築した。
論文参考訳（メタデータ） (2023-02-05T22:17:04Z)
A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise [2.019622939313173]
重み付き最小二乗線形回帰は、少なくとも$epsilon n$ arbitrary outliersの$n$のラベル特徴サンプルを破損させたと仮定して再検討する。本稿では,$(Sigma,Xi) や $Xi$ の演算ノルムに関する知識を前提に,電力法に基づくほぼ最適に計算可能な推定器を提案する。
論文参考訳（メタデータ） (2022-09-06T23:37:31Z)
Finite-Sample Maximum Likelihood Estimation of Location [16.44999338864628]
固定$f$ の場合、最大類似度推定 (MLE) は infty$ に対して$n の極限で最適であることが知られている。任意の$f$と$n$について、滑らかな$f$のフィッシャー情報に基づいて同様の理論を復元できることを示し、そこでは滑らかな半径が$n$で崩壊する。
論文参考訳（メタデータ） (2022-06-06T04:33:41Z)
TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm [64.13217062232874]
最も強力で成功したモダリティの1つは、全ての分布を$ell$距離に近似し、基本的に最も近い$t$-piece次数-$d_$の少なくとも1倍大きい。本稿では,この数値をほぼ最適に推定する手法を提案する。
論文参考訳（メタデータ） (2022-02-15T03:49:28Z)
Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
対象ベクトルの少数のエントリのみを問合せすることを目的とした線形回帰のためのアクティブサンプリングアルゴリズムについて検討する。我々はこの$d$への依存が対数的要因まで最適であることを示す。また、損失関数に対して最初の全感度上界$O(dmax1,p/2log2 n)$を提供し、最大で$p$成長する。
論文参考訳（メタデータ） (2021-11-09T00:20:01Z)
Sparse sketches with small inversion bias [79.77110958547695]
逆バイアスは、逆の共分散に依存する量の推定を平均化するときに生じる。本研究では、確率行列に対する$(epsilon,delta)$-unbiased estimatorという概念に基づいて、逆バイアスを解析するためのフレームワークを開発する。スケッチ行列 $S$ が密度が高く、すなわちサブガウスのエントリを持つとき、$(epsilon,delta)$-unbiased for $(Atop A)-1$ は $m=O(d+sqrt d/ のスケッチを持つ。
論文参考訳（メタデータ） (2020-11-21T01:33:15Z)
Accelerating Optimization and Reinforcement Learning with Quasi-Stochastic Approximation [2.294014185517203]
本稿では、収束理論を準確率近似に拡張することを目的とする。強化学習のためのグラデーションフリー最適化とポリシー勾配アルゴリズムへの応用について説明する。
論文参考訳（メタデータ） (2020-09-30T04:44:45Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
S$状態、$A$アクション、割引係数$gamma in (0,1)$、近似しきい値$epsilon > 0$の MDP が与えられた場合、$epsilon$-Optimal Policy を学ぶためのモデルなしアルゴリズムを提供する。十分小さな$epsilon$の場合、サンプルの複雑さで改良されたアルゴリズムを示す。
論文参考訳（メタデータ） (2020-06-06T13:34:41Z)
Maximizing Determinants under Matroid Constraints [69.25768526213689]
我々は、$det(sum_i in Sv_i v_i v_itop)$が最大になるような基底を$S$$$$M$とする問題を研究する。この問題は、実験的なデザイン、商品の公平な割り当て、ネットワーク設計、機械学習など、さまざまな分野に現れている。
論文参考訳（メタデータ） (2020-04-16T19:16:38Z)
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
本稿では,決定論的システムにおける関数近似を用いたQ$学習の問題について検討する。もし$delta = Oleft(rho/sqrtdim_Eright)$なら、$Oleft(dim_Eright)$を使って最適なポリシーを見つけることができる。
論文参考訳（メタデータ） (2020-02-17T18:41:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。