Fugu-MT 論文翻訳(概要): Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

論文の概要: Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

arxiv url: http://arxiv.org/abs/2103.16377v1
Date: Tue, 30 Mar 2021 14:17:50 GMT
ステータス: 翻訳完了
システム内更新日: 2021-03-31 14:34:19.958554
Title: Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity
Title（参考訳）: ばらつきを低減したGreedy-GQ:有限時間解析と複雑度の改善
Authors: Shaocong Ma, Ziyi Chen, Yi Zhou, Shaofeng Zou
Abstract要約: オフポリシー最適制御のための分散還元Greedy-GQ(VRGreedy-GQ)アルゴリズムを提案する。我々はVR-Greedy-GQがオリジナルのGreedy-GQよりもはるかに小さなバイアス誤差を達成していることを示す。
参考スコア（独自算出の注目度）: 26.298213774857302
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Greedy-GQ is a value-based reinforcement learning (RL) algorithm for optimal control. Recently, the finite-time analysis of Greedy-GQ has been developed under linear function approximation and Markovian sampling, and the algorithm is shown to achieve an $\epsilon$-stationary point with a sample complexity in the order of $\mathcal{O}(\epsilon^{-3})$. Such a high sample complexity is due to the large variance induced by the Markovian samples. In this paper, we propose a variance-reduced Greedy-GQ (VR-Greedy-GQ) algorithm for off-policy optimal control. In particular, the algorithm applies the SVRG-based variance reduction scheme to reduce the stochastic variance of the two time-scale updates. We study the finite-time convergence of VR-Greedy-GQ under linear function approximation and Markovian sampling and show that the algorithm achieves a much smaller bias and variance error than the original Greedy-GQ. In particular, we prove that VR-Greedy-GQ achieves an improved sample complexity that is in the order of $\mathcal{O}(\epsilon^{-2})$. We further compare the performance of VR-Greedy-GQ with that of Greedy-GQ in various RL experiments to corroborate our theoretical findings.
Abstract（参考訳）: Greedy-GQは、最適制御のための値ベース強化学習(RL)アルゴリズムである。近年、greedy-gqの有限時間解析は線形関数近似とマルコフサンプリングの下で開発され、このアルゴリズムは$\mathcal{o}(\epsilon^{-3})$の順にサンプル複雑性を持つ$\epsilon$-stationary pointを達成することが示されている。このような高いサンプル複雑性はマルコフのサンプルによって引き起こされる大きな分散に起因する。本稿では,オフポリシー最適制御のための分散低減型greedy-gq(vr-greedy-gq)アルゴリズムを提案する。特に,SVRGに基づく分散低減手法を適用し,2つの時間スケール更新の確率的分散を低減する。線形関数近似およびマルコフサンプリングの下でのVR-Greedy-GQの有限時間収束について検討し、アルゴリズムが元のGreedy-GQよりもはるかに小さなバイアスと分散誤差を達成することを示す。特に、VR-Greedy-GQ が $\mathcal{O}(\epsilon^{-2})$ の順序で改良されたサンプル複雑性を実現することを証明している。さらに,VR-Greedy-GQとGreedy-GQの様々なRL実験の性能を比較し,理論的な知見を裏付ける。

関連論文リスト

A Greedy Strategy for Graph Cut [95.2841574410968]
GGCと呼ばれるグラフカットの問題を解決するための欲求戦略を提案する。これは、各データサンプルがクラスタと見なされる状態から始まり、2つのクラスタを動的にマージする。 GGCはサンプル数に関してほぼ線形な計算複雑性を持つ。
論文参考訳（メタデータ） (2024-12-28T05:49:42Z)
Stochastic Extragradient with Random Reshuffling: Improved Convergence for Variational Inequalities [40.1331539540764]
本稿では,3種類のVIPに対してランダムリシャッフル(SEG-RR)を用いたSEGの収束解析を行う。我々は,SEG-RRが均一な置換サンプリングSEGよりも高速に収束する条件を導出する。単調な設定では,SEG-RRの解析により,大きなバッチサイズを伴わずに任意の精度で収束が保証される。
論文参考訳（メタデータ） (2024-03-11T20:35:52Z)
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes [41.61653528766776]
本稿では, 自然政策勾配を求めるために, 加速勾配降下過程を利用する自然促進政策勾配(PGAN)アルゴリズムを提案する。繰り返しは$mathcalO(epsilon-2)$サンプル複雑性と$mathcalO(epsilon-1)$複雑さを達成する。 Hessian-free および IS-free アルゴリズムのクラスでは、ANPG は $mathcalO(epsilon-frac12)$ の係数で最もよく知られたサンプルの複雑さを破り、それらの状態と同時に一致する。
論文参考訳（メタデータ） (2023-10-18T03:00:15Z)
Finite-Time Error Bounds for Greedy-GQ [20.51105692499517]
We show that Greedy-GQ algorithm converges fast-time error。我々の分析は、ステップサイズを選択するために、より高速な収束ステップサイズを提供する。
論文参考訳（メタデータ） (2022-09-06T15:04:57Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
勾配ランゲヴィン・ダイナミクスは非エプス最適化問題を解くための最も基本的なアルゴリズムの1つである。本稿では、このタイプの2つの変種、すなわち、分散還元ランジュバンダイナミクスと再帰勾配ランジュバンダイナミクスを示す。
論文参考訳（メタデータ） (2022-03-30T11:39:00Z)
$\bar{G}_{mst}$:An Unbiased Stratified Statistic and a Fast Gradient Optimization Algorithm Based on It [0.0]
barG_mst$をベースとしたMSSGという新しいアルゴリズムは、他のsgdライクなアルゴリズムより優れている。
論文参考訳（メタデータ） (2021-10-07T11:48:55Z)
Correcting Momentum with Second-order Information [50.992629498861724]
最適積に$O(epsilon)$epsilon点を求める非臨界最適化のための新しいアルゴリズムを開発した。我々は、さまざまな大規模ディープラーニングベンチマークとアーキテクチャで結果を検証する。
論文参考訳（メタデータ） (2021-03-04T19:01:20Z)
Convergence Analysis of Homotopy-SGD for non-convex optimization [43.71213126039448]
ホモトピー法とSGDを組み合わせた一階述語アルゴリズム、Gradienty-Stoch Descent (H-SGD)を提案する。いくつかの仮定の下で、提案した問題の理論的解析を行う。実験の結果,H-SGDはSGDより優れていた。
論文参考訳（メタデータ） (2020-11-20T09:50:40Z)
Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms [65.09383385484007]
2つの時間スケール近似(SA)は、値に基づく強化学習アルゴリズムで広く使われている。本稿では,2つの時間スケール線形および非線形TDCとGreedy-GQアルゴリズムの漸近収束率について検討する。
論文参考訳（メタデータ） (2020-11-10T11:36:30Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
学習者が生成モデル$Y = langle X,w* rangle + epsilon$から$n$のサンプルにアクセスできるような高次元頑健な線形回帰問題について検討する。 i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance, (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
論文参考訳（メタデータ） (2020-07-16T06:44:44Z)
Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds [72.63031036770425]
有界非次元最適化のための差分プライベート(DP)アルゴリズムを提案する。標準勾配法に対する経験的優位性について,2つの一般的なディープラーニング手法を実証する。
論文参考訳（メタデータ） (2020-06-24T06:01:24Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
適応アルゴリズムは勾配の歴史を用いて勾配を更新し、深層ニューラルネットワークのトレーニングにおいてユビキタスである。本稿では,非コンケーブ最小値問題に対するOptimisticOAアルゴリズムの変種を解析する。実験の結果,適応型GAN非適応勾配アルゴリズムは経験的に観測可能であることがわかった。
論文参考訳（メタデータ） (2019-12-26T22:10:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。