Fugu-MT 論文翻訳(概要): On Gaussian approximation for entropy-regularized Q-learning with function approximation

論文の概要: On Gaussian approximation for entropy-regularized Q-learning with function approximation

arxiv url: http://arxiv.org/abs/2605.17678v1
Date: Sun, 17 May 2026 22:23:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 23:51:08.395821
Title: On Gaussian approximation for entropy-regularized Q-learning with function approximation
Title（参考訳）: 関数近似を用いたエントロピー規則化Q-ラーニングのためのガウス近似について
Authors: Artemy Rubtsov, Rahul Singh, Eric Moulines, Alexey Naumov, Sergey Samsonov,
Abstract要約: エントロピー規則化非同期Q-ラーニングによって生成されるポリアク・アッパートイテレートの高次元中心定理における収束率を導出する。我々は、次数$n-1/4$の確率で凸距離に束縛されたガウス近似を$n$の多変数因子まで確立し、ここでは、$n$はアルゴリズムが使用するサンプルの数である。
参考スコア（独自算出の注目度）: 30.147231451149064
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak--Ruppert averaged iterates generated by entropy-regularized asynchronous Q-learning with linear function approximation and a polynomial stepsize $k^{-ω}$, $ω\in (1/2,1)$. Assuming that the sequence of observed triples $(s_k,a_k,s_{k+1})_{k \geq 0}$ forms a uniformly geometrically ergodic Markov chain, and under suitable regularity conditions for the projected soft Bellman equation, we establish a Gaussian approximation bound in the convex distance with rate of order $n^{-1/4}$, up to polylogarithmic factors in $n$, where $n$ is the number of samples used by the algorithm. To obtain this result, we combine a linearization of the soft Bellman recursion with a Gaussian approximation for the leading martingale term. Finally, we derive high-order moment bounds for the algorithm's last iterate, which might be of independent interest.
Abstract（参考訳）: 本稿では,線形関数近似と多項式段数$k^{-ω}$,$ω\in (1/2,1)$のエントロピー規則化非同期Q-ラーニングによって生成される,Polyak-Ruppert平均イテレートの高次元中心極限定理における収束率を導出する。観測された三重項の列 $(s_k,a_k,s_{k+1})_{k \geq 0}$ が一様にエルゴード的マルコフ連鎖を形成し、投影されたソフトベルマン方程式の正則性条件の下で、次数$n^{-1/4}$ の凸距離でガウス近似を確立する。この結果を得るために,軟ベルマン再帰の線形化とガウス近似を組み合わせる。最後に、アルゴリズムの最後の反復に対して高次モーメント境界を導出する。

関連論文リスト

Gaussian Approximation for Asynchronous Q-learning [11.260593100797381]
マルティンゲール差分和に対する高次元中心極限定理を証明した。アルゴリズムの最後の繰り返しに対する高次モーメントのバウンダリを提示する。
論文参考訳（メタデータ） (2026-04-08T17:37:15Z)
Finite-Sample Wasserstein Error Bounds and Concentration Inequalities for Nonlinear Stochastic Approximation [6.800624963330628]
ワッサーシュタイン-$p$距離における非線形近似アルゴリズムの非漸近誤差境界を導出する。正規化された最後の繰り返しは、$p$-ワッサーシュタイン距離のガウス分布に階数$_n1/6$で収束することを示し、$_n$はステップサイズである。これらの分布保証は、モーメント境界やマルコフの不等式から得られるものより改善される高確率濃度の不等式を暗示する。
論文参考訳（メタデータ） (2026-02-02T18:41:06Z)
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation [22.652143194356864]
本研究では, 勾配勾配勾配(SGD)を一定のステップサイズで解くことで, 密接な凸と滑らかな問題を解く問題に対処する。得られた推定子の平均二乗誤差を、反復数$n$に対して拡張する。我々の分析は、時相マルコフ連鎖と見なされるSGDの特性に依存している。
論文参考訳（メタデータ） (2024-10-07T15:02:48Z)
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence [69.65563161962245]
ニュートン法を用いて,滑らかで強凸な目的関数を考える。最適段階において局所収束に遷移する普遍重み付き平均化スキームが存在することを示す。
論文参考訳（メタデータ） (2022-04-20T07:14:21Z)
Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
標準スキームの最後の繰り返しの2乗誤差に対して、$t_mathrmmix tfracdn$の非漸近境界を示す。マルコフ雑音による政策評価について,これらの結果のまとめを導出する。
論文参考訳（メタデータ） (2021-12-23T18:47:50Z)
Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo [60.785586069299356]
この研究は、2-ワッサーシュタイン距離におけるサンプリング誤差の非同相解析のための一般的な枠組みを提供する。我々の理論解析は数値実験によってさらに検証される。
論文参考訳（メタデータ） (2021-09-08T18:00:05Z)
On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration [115.1954841020189]
The inequality and non-asymptotic properties of approximation procedure with Polyak-Ruppert averaging。一定のステップサイズと無限大となる反復数を持つ平均的反復数に対する中心極限定理(CLT)を証明する。
論文参考訳（メタデータ） (2020-04-09T17:54:18Z)
Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
非滑らかで非滑らかな関数の定常点を見つけるための最初の非漸近解析を提供する。特に、アダマール半微分可能函数(おそらく非滑らか関数の最大のクラス)について研究する。
論文参考訳（メタデータ） (2020-02-10T23:23:04Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。