Fugu-MT 論文翻訳(概要): The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

論文の概要: The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

arxiv url: http://arxiv.org/abs/2110.14427v3
Date: Sun, 18 Jun 2023 18:11:56 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-22 06:36:26.676688
Title: The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning
Title（参考訳）: 確率近似と強化学習における漸近統計量のODE法
Authors: Vivek Borkar, Shuhang Chen, Adithya Devraj, Ioannis Kontoyiannis and Sean Meyn
Abstract要約: この論文は、$d$-dimensional approximation recursion, $$theta_n+1=theta_n + alpha_n + 1 f(theta_n, Phi_n+1) $$ in ここで$Phi$は、一般的な状態空間上の幾何学的にエルゴード的なマルコフ連鎖である。
参考スコア（独自算出の注目度）: 6.92974337901767
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ in which $\Phi$ is a geometrically ergodic Markov chain on a general state space $\textsf{X}$ with stationary distribution $\pi$, and $f:\Re^d\times\textsf{X}\to\Re^d$. The main results are established under a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3), and a stability condition for the mean flow with vector field $\bar{f}(\theta)=\textsf{E}[f(\theta,\Phi)]$, with $\Phi\sim\pi$. (i) $\{ \theta_n\}$ is convergent a.s. and in $L_4$ to the unique root $\theta^*$ of $\bar{f}(\theta)$. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. (iii) The CLT holds for the normalized version, $z_n{=:} \sqrt{n} (\theta^{\text{PR}}_n -\theta^*)$, of the averaged parameters, $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$, subject to standard assumptions on the step-size. Moreover, the normalized covariance converges, $$ \lim_{n \to \infty} n \textsf{E} [ {\widetilde{\theta}}^{\text{ PR}}_n ({\widetilde{\theta}}^{\text{ PR}}_n)^T ] = \Sigma_\theta^*,\;\;\;\textit{with $\widetilde{\theta}^{\text{ PR}}_n = \theta^{\text{ PR}}_n -\theta^*$,} $$ where $\Sigma_\theta^*$ is the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and the Markov chain $\Phi$ is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded: $ \textsf{E} [ \| \theta_n \|^2 ] \to \infty$ as $n\to\infty$.
Abstract（参考訳）: 論文は、$d$-dimensional stochastic approximation recursion、$$$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \phi_{n+1}) $$、$\phi$は一般状態空間上の幾何学的エルゴードマルコフ連鎖である$\textsf{x}$、定常分布$\pi$、$f:\re^d\times\textsf{x}\to\re^d$である。主な結果はDonsker-Varadhan Lyapunov ドリフト条件 (DV3) とベクトル場 $\bar{f}(\theta)=\textsf{E}[f(\theta,\Phi)]$, with $\Phi\sim\pi$ による平均流の安定性条件の下にある。 (i)$\{ \theta_n\}$ は収束 a.s. であり、$L_4$ は一意根 $\theta^*$ of $\bar{f}(\theta)$ に収束する。 (ii)正規化誤差に対する通常の1次元CLTと同様に関数型CLTが確立される。 (iii) CLT は正規化バージョン $z_n{=:} \sqrt{n} (\theta^{\text{PR}}_n -\theta^*)$, 平均化パラメータ $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$ を、ステップサイズに関する標準的な仮定に従って保持する。さらに、正規化された共分散は、$$ \lim_{n \to \infty} n \textsf{E} [ {\widetilde{\theta}}^{\text{ PR}}_n ({\widetilde{\theta}}^{\text{ PR}}_n)^T ] = \Sigma_\theta^*,\;\;\;\;\textit{with $\widetilde{\theta}^{\text{ PR}}_n = \theta^{\text{ PR}}_n -\theta^*$,} $$$$\Sigma_\theta^*$はポリアクとルパートの最小共分散である。 (iv) 例えば、$f$ と $\bar{f}$ が $\theta$ において線型であり、マルコフ連鎖 $\Phi$ は幾何学的にエルゴード的であるが満足しない(DV3)。アルゴリズムは収束するが、第二モーメントは非有界である: $ \textsf{E} [ \| \theta_n \|^2 ] \to \infty$ as $n\to\infty$。

関連論文リスト

Variance-Dependent Regret Lower Bounds for Contextual Bandits [65.93854043353328]
これは従来の$tildeO(dsqrtK)$ regret bound to $tildeO(dsqrtsum_k=1Ksigma_k2)$で改善されている。
論文参考訳（メタデータ） (2025-03-15T07:09:36Z)
Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation [49.1574468325115]
時間差分学習(TD(0))は強化学習の基本である。マルコフデータを混合したバニラTD(0)の最初の高確率有限サンプル解析を行う。
論文参考訳（メタデータ） (2025-02-08T22:01:02Z)
A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization [0.7373617024876725]
Lipschitz-of-$nabla f$ $mathcalS_k|p$。 $mathcalS_k|p$。 $nabla f$. $mathcalS_k|p$。
論文参考訳（メタデータ） (2024-09-28T18:16:32Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
線形スケッチを用いた行列とベクトルノルムの残差誤差推定問題について検討する。これは、前作とほぼ同じスケッチサイズと精度で、経験的にかなり有利であることを示す。また、スパースリカバリ問題に対して$Omega(k2/pn1-2/p)$低いバウンダリを示し、これは$mathrmpoly(log n)$ factorまで厳密である。
論文参考訳（メタデータ） (2024-08-16T02:33:07Z)
Revisiting Step-Size Assumptions in Stochastic Approximation [1.3654846342364308]
この論文は、一般的なマルコフ的な設定でステップサイズの選択を再考する。大きな結論は、$rho =0$ または $rho1/2$ の選択は、選択した設定でのみ正当化されるということである。
論文参考訳（メタデータ） (2024-05-28T05:11:05Z)
On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
本稿では、RMSPropとその運動量拡張を考察し、$frac1Tsum_k=1Tの収束速度を確立する。我々の収束率は、次元$d$を除くすべての係数に関して下界と一致する。収束率は$frac1Tsum_k=1Tと類似していると考えられる。
論文参考訳（メタデータ） (2024-02-01T07:21:32Z)
Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products [12.047053875716506]
マルチインデックスモデルに対する十分な次元削減の問題を考察する。高速パラメトリック収束速度が$C_d cdot n-1/2$であることを示す。
論文参考訳（メタデータ） (2023-12-24T12:28:07Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
非線形測定では、ほとんどの先行結果は一様ではない、すなわち、すべての$mathbfx*$に対してではなく、固定された$mathbfx*$に対して高い確率で保持される。本フレームワークはGCSに1ビット/一様量子化観測と単一インデックスモデルを標準例として適用する。また、指標集合が計量エントロピーが低い製品プロセスに対して、より厳密な境界を生み出す濃度不等式も開発する。
論文参考訳（メタデータ） (2023-09-25T17:54:19Z)
Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality [0.0]
我々は、$symbol$k$収束問題に対して、新しいマップベースのアルゴリズム(mathsfnorMtext-mathsfSGD$)を提案する。
論文参考訳（メタデータ） (2023-05-10T01:12:11Z)
Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation [22.51165277694864]
本稿では,線形近似 (LSA) アルゴリズムの有限時間解析を行う。 LSAは$d$次元線形系の近似解を計算するために用いられる。
論文参考訳（メタデータ） (2022-07-10T14:36:04Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
我々は、$bf K$ の固有スペクトルが$bf w$ の i.d. 成分の分布とは独立であることを示す。 3次ランダム特徴(TRF)と呼ばれる新しいランダム手法を提案する。提案したランダムな特徴の計算には乗算が不要であり、古典的なランダムな特徴に比べてストレージに$b$のコストがかかる。
論文参考訳（メタデータ） (2021-10-05T09:33:49Z)
A General Derivative Identity for the Conditional Mean Estimator in Gaussian Noise and Some Applications [128.4391178665731]
文献のいくつかのアイデンティティは、$E[bf X|bf Y=bf y]$を条件分散、スコア関数、高階条件モーメントなどの他の量に接続します。本稿の目的は,これらのアイデンティティの統一的視点を提供することである。
論文参考訳（メタデータ） (2021-04-05T12:48:28Z)
Accelerating Optimization and Reinforcement Learning with Quasi-Stochastic Approximation [2.294014185517203]
本稿では、収束理論を準確率近似に拡張することを目的とする。強化学習のためのグラデーションフリー最適化とポリシー勾配アルゴリズムへの応用について説明する。
論文参考訳（メタデータ） (2020-09-30T04:44:45Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
エントロピー正則化で最適な輸送を解くには、ベクトルに繰り返し適用される$ntimes n$ kernel matrixを計算する必要がある。代わりに、$c(x,y)=-logdotpvarphi(x)varphi(y)$ ここで$varphi$は、地上空間から正のorthant $RRr_+$への写像であり、$rll n$である。
論文参考訳（メタデータ） (2020-06-12T10:21:40Z)
A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
我々はAdam Adagradと$O(d(N)/st)$アルゴリズムの収束の証明を示す。 Adamはデフォルトパラメータで使用する場合と同じ収束$O(d(N)/st)$で収束する。
論文参考訳（メタデータ） (2020-03-05T01:56:17Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。