Fugu-MT 論文翻訳(概要): Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

論文の概要: Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

arxiv url: http://arxiv.org/abs/2402.11800v3
Date: Wed, 27 Mar 2024 15:48:29 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-28 22:03:50.948076
Title: Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling
Title（参考訳）: 遅延更新による確率近似:マルコフサンプリングにおける有限時間率
Authors: Arman Adibi, Nicolo Dal Fabbro, Luca Schenato, Sanjeev Kulkarni, H. Vincent Poor, George J. Pappas, Hamed Hassani, Aritra Mitra,
Abstract要約: マルコフサンプリングの遅延更新による近似スキームの非漸近的性能について検討した。我々の理論的な発見は、幅広いアルゴリズムの遅延の有限時間効果に光を当てた。
参考スコア（独自算出の注目度）: 73.5602474095954
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Motivated by applications in large-scale and multi-agent reinforcement learning, we study the non-asymptotic performance of stochastic approximation (SA) schemes with delayed updates under Markovian sampling. While the effect of delays has been extensively studied for optimization, the manner in which they interact with the underlying Markov process to shape the finite-time performance of SA remains poorly understood. In this context, our first main contribution is to show that under time-varying bounded delays, the delayed SA update rule guarantees exponentially fast convergence of the \emph{last iterate} to a ball around the SA operator's fixed point. Notably, our bound is \emph{tight} in its dependence on both the maximum delay $\tau_{max}$, and the mixing time $\tau_{mix}$. To achieve this tight bound, we develop a novel inductive proof technique that, unlike various existing delayed-optimization analyses, relies on establishing uniform boundedness of the iterates. As such, our proof may be of independent interest. Next, to mitigate the impact of the maximum delay on the convergence rate, we provide the first finite-time analysis of a delay-adaptive SA scheme under Markovian sampling. In particular, we show that the exponent of convergence of this scheme gets scaled down by $\tau_{avg}$, as opposed to $\tau_{max}$ for the vanilla delayed SA rule; here, $\tau_{avg}$ denotes the average delay across all iterations. Moreover, the adaptive scheme requires no prior knowledge of the delay sequence for step-size tuning. Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms, including TD learning, Q-learning, and stochastic gradient descent under Markovian sampling.
Abstract（参考訳）: 大規模・マルチエージェント強化学習の応用により,マルコフサンプリング下での遅延更新を伴う確率近似(SA)スキームの漸近的性能について検討した。遅延の影響は最適化のために広く研究されているが、それらが基礎となるマルコフ過程と相互作用し、SAの有限時間性能を形成する方法はまだよく分かっていない。この文脈において、我々の最初の主な貢献は、時間変化した有界遅延の下で、遅延SA更新規則は、SA演算子の固定点の周囲の球に \emph{last iterate} が指数関数的に高速収束することを保証していることを示すことである。特に、我々の境界は最大遅延$\tau_{max}$と混合時間$\tau_{mix}$の両方に依存して \emph{tight} となる。この厳密な境界を達成するために、既存の様々な遅延最適化解析とは異なり、イテレートの均一な有界性を確立することに依存する新しい帰納的証明手法を開発した。したがって、我々の証明は独立した関心を持つかもしれない。次に、最大遅延が収束率に与える影響を軽減するために、マルコフサンプリングの下での遅延適応型SAスキームの最初の有限時間解析を行う。特に、このスキームの収束指数は、バニラ遅延SA則に対する$\tau_{max}$とは対照的に、$\tau_{avg}$でスケールダウンする。さらに、適応型スキームはステップサイズチューニングのための遅延シーケンスの事前知識を必要としない。理論的には, マルコフサンプリング下でのTD学習, Q-ラーニング, 確率勾配降下を含む, 幅広いアルゴリズムの遅延の有限時間効果に光を当てた。

論文の概要: Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

関連論文リスト