Fugu-MT 論文翻訳(概要): Accelerating Optimization and Reinforcement Learning with Quasi-Stochastic Approximation

論文の概要: Accelerating Optimization and Reinforcement Learning with Quasi-Stochastic Approximation

arxiv url: http://arxiv.org/abs/2009.14431v2
Date: Thu, 1 Oct 2020 04:39:19 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-13 00:20:30.930308
Title: Accelerating Optimization and Reinforcement Learning with Quasi-Stochastic Approximation
Title（参考訳）: 準確率近似による最適化と強化学習の高速化
Authors: Shuhang Chen, Adithya Devraj, Andrey Bernstein and Sean Meyn
Abstract要約: 本稿では、収束理論を準確率近似に拡張することを目的とする。強化学習のためのグラデーションフリー最適化とポリシー勾配アルゴリズムへの応用について説明する。
参考スコア（独自算出の注目度）: 2.294014185517203
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ODE method has been a workhorse for algorithm design and analysis since the introduction of the stochastic approximation. It is now understood that convergence theory amounts to establishing robustness of Euler approximations for ODEs, while theory of rates of convergence requires finer analysis. This paper sets out to extend this theory to quasi-stochastic approximation, based on algorithms in which the "noise" is based on deterministic signals. The main results are obtained under minimal assumptions: the usual Lipschitz conditions for ODE vector fields, and it is assumed that there is a well defined linearization near the optimal parameter $\theta^*$, with Hurwitz linearization matrix $A^*$. The main contributions are summarized as follows: (i) If the algorithm gain is $a_t=g/(1+t)^\rho$ with $g>0$ and $\rho\in(0,1)$, then the rate of convergence of the algorithm is $1/t^\rho$. There is also a well defined "finite-$t$" approximation: \[ a_t^{-1}\{\Theta_t-\theta^*\}=\bar{Y}+\Xi^{\mathrm{I}}_t+o(1) \] where $\bar{Y}\in\mathbb{R}^d$ is a vector identified in the paper, and $\{\Xi^{\mathrm{I}}_t\}$ is bounded with zero temporal mean. (ii) With gain $a_t = g/(1+t)$ the results are not as sharp: the rate of convergence $1/t$ holds only if $I + g A^*$ is Hurwitz. (iii) Based on the Ruppert-Polyak averaging of stochastic approximation, one would expect that a convergence rate of $1/t$ can be obtained by averaging: \[ \Theta^{\text{RP}}_T=\frac{1}{T}\int_{0}^T \Theta_t\,dt \] where the estimates $\{\Theta_t\}$ are obtained using the gain in (i). The preceding sharp bounds imply that averaging results in $1/t$ convergence rate if and only if $\bar{Y}=\sf 0$. This condition holds if the noise is additive, but appears to fail in general. (iv) The theory is illustrated with applications to gradient-free optimization and policy gradient algorithms for reinforcement learning.
Abstract（参考訳）: ODE法は確率近似の導入以来,アルゴリズムの設計と解析の作業場となっている。現在、収束理論は ODE に対するオイラー近似の堅牢性を確立し、収束率の理論はより微細な解析を必要とすると理解されている。本稿では、この理論を「ノイズ」が決定論的信号に基づくアルゴリズムに基づく準確率近似に拡張することを目的とする。主な結果は最小の仮定で得られる: ODE ベクトル場に対する通常のリプシッツ条件、そして Hurwitz の線型化行列 $A^*$ が最適パラメータ $\theta^*$ の近くでよく定義された線型化が存在すると仮定される。主な貢献は以下のとおりである。 (i)アルゴリズムのゲインが$a_t=g/(1+t)^\rho$で$g>0$と$\rho\in(0,1)$の場合、アルゴリズムの収束率は1/t^\rho$である。 a_t^{-1}\{\theta_t-\theta^*\}=\bar{Y}+\Xi^{\mathrm{I}}_t+o(1) \] ここで、$\bar{Y}\in\mathbb{R}^d$は論文で同定されたベクトルであり、$\{\Xi^{\mathrm{I}}_t\}$はゼロ時間平均で有界である。 (ii)$a_t = g/(1+t)$ の場合、結果はそれほど鋭くはない: 1/t$ の収束率は、$i + g a^*$ が hurwitz である場合にのみ成り立つ。 (iii) 確率近似の ruppert-polyak 平均化に基づいて、1/t$ の収束率は平均化によって得られると期待できる: \[ \theta^{\text{rp}}_t=\frac{1}{t}\int_{0}^t \theta_t\,dt \] ここで、ゲインを用いて$\{\theta_t\}$ の見積が得られる。 (i)。前回のシャープな境界は、平均化が1/t$収束率をもたらすことを暗示する:$\bar{Y}=\sf 0$ である。この条件は、雑音が加法的であるならば成り立つが、一般には失敗するように見える。 (iv)この理論は、強化学習のための勾配なし最適化とポリシー勾配アルゴリズムへの応用を例証する。

関連論文リスト

Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
線形スケッチを用いた行列とベクトルノルムの残差誤差推定問題について検討する。これは、前作とほぼ同じスケッチサイズと精度で、経験的にかなり有利であることを示す。また、スパースリカバリ問題に対して$Omega(k2/pn1-2/p)$低いバウンダリを示し、これは$mathrmpoly(log n)$ factorまで厳密である。
論文参考訳（メタデータ） (2024-08-16T02:33:07Z)
Revisiting Step-Size Assumptions in Stochastic Approximation [1.3654846342364308]
この論文は、一般的なマルコフ的な設定でステップサイズの選択を再考する。大きな結論は、$rho =0$ または $rho1/2$ の選択は、選択した設定でのみ正当化されるということである。
論文参考訳（メタデータ） (2024-05-28T05:11:05Z)
On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
本稿では、RMSPropとその運動量拡張を考察し、$frac1Tsum_k=1Tの収束速度を確立する。我々の収束率は、次元$d$を除くすべての係数に関して下界と一致する。収束率は$frac1Tsum_k=1Tと類似していると考えられる。
論文参考訳（メタデータ） (2024-02-01T07:21:32Z)
Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality [0.0]
我々は、$symbol$k$収束問題に対して、新しいマップベースのアルゴリズム(mathsfnorMtext-mathsfSGD$)を提案する。
論文参考訳（メタデータ） (2023-05-10T01:12:11Z)
Breaking the Lower Bound with (Little) Structure: Acceleration in Non-Convex Stochastic Optimization with Heavy-Tailed Noise [28.780192812703948]
重み付き雑音状態において、滑らかだが必ずしも凸な目標を持つ最適化問題を考察する。簡単な構造しか持たない低境界の$Omega(Tfrac1-p3p-2)$よりも高速な速度が得られることを示す。また、軽度条件下では、高い確率収束率が$O(log(T/delta)Tfrac1-p3p-2)$であることを保証する。
論文参考訳（メタデータ） (2023-02-14T00:23:42Z)
Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD [38.221784575853796]
この研究は、勾配を用いて潜在的に一定の滑らかさを持つ非アトー関数の1次定常点を求める問題を考える。我々は、ノイズに一様境界を仮定することなく$mathcalO(fracmathrmpolylog(T)sigmatT)$収束率を証明できる技術を開発した。
論文参考訳（メタデータ） (2023-02-13T18:13:36Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning [3.8098187557917464]
この論文は$d$-dimensional recursion approximation, $$theta_n+1=theta_n + alpha_n + 1 f(theta_n, Phi_n+1)に関するものである。主な結果は、ドスカー・バラダン・リャプノフドリフト条件(DV3)の平均流とバージョンに関する追加条件の下で確立される。 a example is given where $f$ and $barf$ are linear in $theta$, and $Phi$ is a geometryal.
論文参考訳（メタデータ） (2021-10-27T13:38:25Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
対数要因まで $(varepsilon,delta)$-differently private の最適過剰人口損失は $sqrtlog(d)/n + sqrtd/varepsilon n.$ です。損失関数がさらなる滑らかさの仮定を満たすとき、余剰損失は$sqrtlog(d)/n + (log(d)/varepsilon n)2/3で上界(対数因子まで)であることが示される。
論文参考訳（メタデータ） (2021-03-02T06:53:44Z)
Sparse sketches with small inversion bias [79.77110958547695]
逆バイアスは、逆の共分散に依存する量の推定を平均化するときに生じる。本研究では、確率行列に対する$(epsilon,delta)$-unbiased estimatorという概念に基づいて、逆バイアスを解析するためのフレームワークを開発する。スケッチ行列 $S$ が密度が高く、すなわちサブガウスのエントリを持つとき、$(epsilon,delta)$-unbiased for $(Atop A)-1$ は $m=O(d+sqrt d/ のスケッチを持つ。
論文参考訳（メタデータ） (2020-11-21T01:33:15Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
学習者が生成モデル$Y = langle X,w* rangle + epsilon$から$n$のサンプルにアクセスできるような高次元頑健な線形回帰問題について検討する。 i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance, (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
論文参考訳（メタデータ） (2020-07-16T06:44:44Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。